Applications to Transistors

Similar documents
CPE 200L LABORATORY 2: DIGITAL LOGIC CIRCUITS BREADBOARD IMPLEMENTATION UNIVERSITY OF NEVADA, LAS VEGAS GOALS:

ECE 274 Digital Logic. Digital Design. Sequential Logic Design Controller Design: Laser Timer Example

LOGICAL FOUNDATION OF MUSIC

Chapter 5. Synchronous Sequential Logic. Outlines

Outline. Circuits & Layout. CMOS VLSI Design

Chapter 1: Introduction

Mapping Arbitrary Logic Functions into Synchronous Embedded Memories For Area Reduction on FPGAs

WE SERIES DIRECTIONAL CONTROL VALVES

Lecture 3: Circuits & Layout

CPSC 121: Models of Computation Lab #2: Building Circuits

GRABLINKTM. FullTM. - DualBaseTM. - BaseTM. GRABLINK Full TM. GRABLINK DualBase TM. GRABLINK Base TM

Chapter 3: Sequential Logic Design -- Controllers

ECE 274 Digital Logic. Digital Design. Datapath Components Registers. Datapath Components Register with Parallel Load

Sequencer devices. Philips Semiconductors Programmable Logic Devices

Introduction. APPLICATION NOTE 712 DS80C400 Ethernet Drivers. Jun 06, 2003

The Official IDENTITY SYSTEM. A Manual Concerning Graphic Standards and Proper Implementation. As developed and established by the

Homework 1. Homework 1: Measure T CK-Q delay

Application Support. Product Information. Omron STI. Support Engineers are available at our USA headquarters from

Your Summer Holiday Resource Pack: English

ARCHITECTURAL CONSIDERATION OF TOPS-DSP FOR VIDEO PROCESSING. Takao Nishitani. Tokyo Metropolitan University

Standards Overview (updated 7/31/17) English III Louisiana Student Standards by Collection Assessed on. Teach in Collection(s)

walking. Rhythm is one P-.bythm is as Rhythm is built into our pitch, possibly even more so. heartbeats, or as fundamental to mu-

LAERSKOOL RANDHART ENGLISH GRADE 5 DEMARCATION FOR EXAM PAPER 2

Pitch I. I. Lesson 1 : Staff

Safety Relay Unit G9SB

Soft Error Derating Computation in Sequential Circuits

Outline. Annual Sales. A Brief History. Transistor Types. Invention of the Transistor. Lecture 1: Circuits & Layout. Introduction to CMOS VLSI Design

DRAFT. Vocal Music AOS 2 WB 3. Purcell: Music for a While. Section A: Musical contexts. How is this mood achieved through the following?

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

VISUAL IDENTITY GUIDE

Evaluation of the Suitability of Acoustic Characteristics of Electronic Demung to the Original Demung

What do these sentences describe? Write P for plants, A for animals or B for both next to each sentence below. They ve got roots.

lookbook Corporate LG provides a wide-array of display options that can enhance your brand and improve communications campus-wide.

Safety Relay Unit G9SB

Answers to Exercise 3.3 (p. 76)

1. Convert the decimal number to binary, octal, and hexadecimal.

1 --FORMAT FOR CITATIONS & DOCUMENTATION-- ( ) YOU MUST CITE A SOURCE EVEN IF YOU PUT INFORMATION INTO YOUR OWN WORDS!

WELCOME. ECE 2030: Introduction to Computer Engineering* Richard M. Dansereau Copyright by R.M. Dansereau,

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

TYPICAL QUESTIONS & ANSWERS

A Proposed Keystream Generator Based on LFSRs. Adel M. Salman Baghdad College for Economics Sciences

MILWAUKEE ELECTRONICS NEWS

The word digital implies information in computers is represented by variables that take a limited number of discrete values.

LCD Data Projector VPL-S500U/S500E/S500M

Corporate Logo Guidelines

SeSSION 9. This session is adapted from the work of Dr.Gary O Reilly, UCD. Session 9 Thinking Straight Page 1

Have they bunched yet? An exploratory study of the impacts of bus bunching on dwell and running times.

Phosphor: Explaining Transitions in the User Interface Using Afterglow Effects

A New Concept of Providing Telemetry Data in Real Time

92.507/1. EYR 203, 207: novaflex universal controller. Sauter Systems

Engineer To Engineer Note

Sequential logic circuits

lookbook Transportation - Airports

Efficient Building Blocks for Reversible Sequential

Interactions of Folk Melody and Transformational (Dis)continuities in Chen Yi s Ba Ban

Synchronising Word Problem for DFAs

EEE130 Digital Electronics I Lecture #1_2. Dr. Shahrel A. Suandi

MODULE 3. Combinational & Sequential logic

Logic Design Viva Question Bank Compiled By Channveer Patil

Computer Architecture and Organization

A.R. ENGINEERING COLLEGE, VILLUPURAM ECE DEPARTMENT

lookbook Higher Education

PRACTICE FINAL EXAM T T. Music Theory II (MUT 1112) w. Name: Instructor:

Chapter Contents. Appendix A: Digital Logic. Some Definitions

Before Reading. Introduce Everyday Words. Use the following steps to introduce students to Nature Walk.

Pro Series White Toner and Neon Range

LOGOMANUAL. guidelines how to use Singing Rock logotype. Version 1.5 English. Lukáš Matěja

Computer Systems Architecture

Laboratory Objectives and outcomes for Digital Design Lab

VOCAL MUSIC I * * K-5. Red Oak Community School District Vocal Music Education. Vocal Music Program Standards and Benchmarks

Big Adventures. Why might you like to have an adventure? What kind of adventures might you enjoy?

Principles of Computer Architecture. Appendix A: Digital Logic

DIGITAL EFFECTS MODULE OWNER'S MANUAL

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

UNIT 1: DIGITAL LOGICAL CIRCUITS What is Digital Computer? OR Explain the block diagram of digital computers.

THE KENYA POLYTECHNIC

Your KIM. characters, along with a fancy. includes scrolling, erase to end of screen, full motions, and the usual goodies. The

Introduction to Digital Logic Missouri S&T University CPE 2210 Exam 3 Logistics

NORTHWESTERN UNIVERSITY TECHNOLOGICAL INSTITUTE

DIGITAL ELECTRONICS: LOGIC AND CLOCKS

WINTER 15 EXAMINATION Model Answer

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam

BCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering

For public transport information phone Bus 415. Easy access on all buses. Middleton Alkrington Middleton Junction Chadderton Oldham

R13 SET - 1 '' ''' '' ' '''' Code No: RT21053

DIGITAL SYSTEM DESIGN UNIT I (2 MARKS)

Contents 2. Notations Used in This Guide 6. Introduction to Your Projector 7. Using Basic Projector Features 28. Setting Up the Projector 15

MODU LE DAY. Class-A, B, AB and C amplifiers - basic concepts, power, efficiency Basic concepts of Feedback and Oscillation. Day 1

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

arxiv: v2 [cs.sd] 13 Dec 2016

Explosion protected add-on thermostat

Chapter 3. Boolean Algebra and Digital Logic

Logic Devices for Interfacing, The 8085 MPU Lecture 4

1 Hour Sample Test Papers: Sample Test Paper 1. Roll No.

Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors

THE MOSSAT COLLECTION BOOK SIX

PHYSICS 5620 LAB 9 Basic Digital Circuits and Flip-Flops

TAP 413-1: Deflecting electron beams in a magnetic field

AN INTRODUCTION TO DIGITAL COMPUTER LOGIC

R13. II B. Tech I Semester Regular Examinations, Jan DIGITAL LOGIC DESIGN (Com. to CSE, IT) PART-A

Transcription:

CS/EE1012 INTRODUCTION TO COMPUTER ENGINEERING SPRING 2013 LAYERED COMPUTER DESIGN 1. Introduction CS/EE1012 will study complete computer system, from pplictions to hrdwre. The study will e in systemtic, structured wy, y using computer lyers. Lyering llows strction! One cn hve top-down study which simplifies the process. The top-down pproch is prcticed frequenctly in college nd industry s it leds to fster stisfctory results. Tht is, it is esier to top-down nlysis nd top-down design in generl. For exmple, computer cn designed lyer y lyer, strting with the top lyer nd ending with the ottom lyer which is the complete design. Our textook, on the other hnd, covers in ottom-up wy, occsionlly mixing with top-down. Bottom-up studies re efficient if the lyers re tightly coupled to ech other nd well defined. This is the cse for our textook! We will follow the textook, deviting from it from time to time. Applictions to Trnsistors Computer Scientist Algorithm designer Progrmmer Chip designer Appliction Level Computtionl Method Level Algorithm Level High-level Lnguge Level Systems personnel & OS processes, memory, I/O, file,... mngers Operting System Level Computer rchitect & Compiler, linker, loder HW/SW Interfce Computer designer Logic designer Architecture Level (Mchine Lnguge Level) Microrchitecture Level (Orgniztion Level) (Register Trnsfer Level, RTL) Logic Level Trnsistor Level - Simulting plne, word processing, controlling n elevtor,.. - Aeroplne surfce, winds, sheets, chrcters, elevtor uttons,... - Astrct mchine - Astrct mchine - Astrct mchine - Astrct mchine - Astrct mchine - Add, sutrct, multiply, lod, store, jump, rnch,.. - 2 s complement integer, FP numers, vectors,... - Registers, memory, ddressing, I/O, interrupts : Astrct mchine - Fetch instruction, increment PC, clculte effective ddress,.. - 2 s complement numers, memory ddresses,... - CPU (registers, uses, ALUs), memory, I/O : digitl systems - AND, OR, NOT, clocked store on flip-flop,.. - Bit - Gtes (AND, OR, NOT) nd flip-flops : digitl circuits - Switch on nd off - Voltge levels - Switches (trnsistors), resistors, cpcitors nd wires : electronic circuits Softwre Hrdwre In the lyered design, lyer is implemented y the lyer elow. There cn e mny possiilities to implement lyer y the lyer elow. To decide which one, design gols re used : speed, cost, size, weight, power consumption, reliility, expendility, flexiility nd comptiility. Often, we concentrte on the speed (performnce) gol nd descries how it cn e used to mke decisions on the design of computer. Polytechnic Institute of NYU Pge 1 of 24 Hndout No : 3 Ferury 1, 2013

In the figure ove, who/wht implements lyer is indicted on the left side. Some lyers re shown with three text lines on the right side. The first text line indictes typicl opertions of the lyer, the second text line indictes typicl opernds of the lyer nd the third text line indictes typicl components of the lyer. Finlly, it must e noted tht ultimtely computer computes y mens of its trnsistors turning (switching) on nd off : The trnsistor lyer is the complete design! It is impossile to cover ll the lyers of computer in one course in detil. Colleges offer courses tht cover one or few lyers in detil. For exmple, Digitl Logic courses covers the Logic Level, Computer Architecture course covers the Architecture nd Microrchitecture levels, the Algorithms course covers the Algorithm level. CS/EE1012 will cover ll the lyers, some in detil nd some riefly. All nine lyers shown in the figure ove re summrized elow. However, efore we discuss the lyers, we will discuss computer fundmentls, then populr computer clssifictions nd then conclude tht the nine lyers ove give etter view of computers. 1.1. Computer Systems The fundmentls : A computer processes digitl informtion. In order to do tht it runs (executes) mchine lnguge progrm. As n exmple, when we uy softwre, such s the Microsoft Word, we uy the mchine lnguge progrm of the Word softwre. A mchine lnguge progrm mnipultes dt. A mchine lnguge progrm consists of mchine lnguge instructions. A mchine lnguge instruction is simple commnd tht cn e immeditely understood y the hrdwre. It commnds the computer to perform simple opertion such s dd, sutrct, nd, shift left, etc. Thus, it cn e directly run y the computer (hrdwre). Mchine lnguge instructions nd dt re in terms of 1s nd 0s nd re stored in the memory. It is not possile to distinguish whether prt of the memory hs n instruction or dt element y just looking t it. This is unique property of tody s computers nd so re clled stored-progrm computers. Dt nd progrms re input from input/output (I/O) devices into the computer memory nd result dt re output to the I/O devices. Printer Disk Mouse Computer Modem Tpe Keyord Disply I/O Devices Computers re clssified with respect to their size, speed nd cost s supercomputers, servers, desktop computers nd emedded computers. Supercomputers re the fstest computers, costing millions of dollrs nd very lrge. They re used for scientific pplictions, such s irplne design, wether forecsting, moleculr simultions. Government gencies nd lrge corportions cn fford them. Servers re lrge computers tht llow multiple users to run generl-purpose pplictions. Compnies nd universities re typicl customers. Desktop computers re single-user mchines, intended to run smll numers pplictions rnging from emil to word processing. Emedded computers re very smll nd control system they re emedded in. They typiclly hve one ppliction to run which is the control of the system they re in. 1.2. Hrdwre vs. Softwre Another clssifiction is hrdwre vs. softwre. Hrdwre is the collection of physicl components, such s chips, wires, PCBs, connectors, I/O devices, etc. tht form computer Softwre is the collection of progrms on com- Polytechnic Institute of NYU Pge 2 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

puter. Softwre nd Hrdwre re equivlent in tht ny opertion performed y the hrdwre cn e uilt into softwre nd ny opertion performed y softwre cn e lso directly relized y hrdwre. Therefore, we hve the hrdwre/softwre trde-off. This equivlence is under the ssumption tht there is sic set of opertions implemented in hrdwre. Decisions on wht to include in hrdwre nd softwre re sed on the required speed, cost, reliility, frequency of expected chnges, etc. There re two types of softwre tody : Appliction nd systems. The mening of the two chnges computer to computer. We define ppliction progrms s those run y ordinry users, such s emil, word processing, spredsheet, simultion progrms, etc. Systems progrms re used to control the hrdwre to mke the computer esy to use, secure nd more efficient. Systems softwre include the operting systems, lnguge trnsltors (compilers, ssemlers), linkers, loders, lirries. They re used y systems people who hve specil privileges (ccess rights) to use the computer. This distinction is enforced y tody s computers in the form of hrdwre control sttes : user nd system sttes. Appliction progrms re run in the user stte nd if they try to run system softwre in this mode n interrupt (exception) is generted The progrm is terminted. System progrms re run in the system stte. Even though softwre is in mchine lnguge, tody it is often developed y first writing in high-level lnguge or n ppliction-oriented lnguge or in ssemly lnguge. High-level lnguges include C++, Jv, C, Fortrn, Cool, Python, PHP, etc. Appliction-oriented lnguges contin constructs nd keywords to develop progrm for specific clss of pplictions, such s simulting computer network. Assemly lnguges re relted to the rchitecture of the processor they re trgeted for. Tht is, for computer with n Intel Pentium processor, one would develop n ssemly lnguge progrm in the Intel ssemly lnguge. If the processor is n IBM Power processor, one would write n IBM Power ssemly lnguge progrm. Since the computer cn run only mchine lnguge progrms, one needs to trnslte the ove progrms to mchine lnguge progrms. To trnslte from high-level lnguge progrm to the mchine lnguge progrm, compilers re used : C++ compiler, Jv compiler, etc. To trnslte from n ssemly lnguge progrm to the mchine lnguge progrm, ssemlers re used : Intel ssemler, IBM ssemler, etc. To trnslte from n ppliction-oriented lnguge progrm to the mchine lnguge progrm, typiclly preprocessing progrms re used to convert to n intermedite form in high-level lnguge nd then they re compiled to the mchine lnguge progrm. Among the three types of lnguges, ppliction-oriented lnguges re the highest level, mening very esy to write nd ssemly lnguges re the lowest, mening hrdest to write. Although it is esier to develop ppliction oriented lnguge progrms, their corresponding mchine code my not e efficient since preprocessors nd compilers my not e sophisticted enough to generte n efficient mchine code. On the other hnd, developing lrge ssemly lnguge progrm my not e prcticl due to the complexity of the lnguge. The common prctice tody is tht for emedded pplictions ssemly nd C progrms re developed since emedded progrms re not lrge. For ll others high-level nd ppliction-oriented lnguges re used. 1.3. Architecture vs. Orgniztion (Microrchitecture) Another computer clssifiction is rchitecture vs. orgniztion (microrchitecture). The rchitecture is the set of resources visile to the mchine lnguge progrmmer : Registers, the memory, dt representtions, ddressing modes, instructions formts, control sttes, I/O controllers, interrupts, etc. Studying the rchitecture implies working on mchine lnguge progrms. Although often the rchitecture is thought to e equivlent to the mchine lnguge set of computer, it is more thn tht. Still, mjor portion of the rchitecture coverge is devoted the mchine lnguge set. A relted issue in the pst ws whether the mchine lnguge set should e complex (complex instruction set computer, CISC) or simple (reduced instruction set computer, RISC). The dete took plce in the 1980s nd first hlf of the 1990s. It ws resolved s the RISC the winner since it llows more efficient pipelining, leds to simpler hrdwre nd esier increse of the clock frequency. The Intel nd Motorol mchine lnguge sets re CISC. The Sun is RISC. Why nd how the Intel CISC rchitecture hs kept its dominnce will e cler lter in the semester. But, simply, this hs een possile y designing n Intel CPU tht converts ech Intel CISC instruction to up to three RISC instructions on the fly. Polytechnic Institute of NYU Pge 3 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

The orgniztion is the set of resources tht relizes the rchitecture which include the CPU, the memory nd I/O controllers. These re digitl systems with registers, uses, ALUs, sequencers, etc. The CPU is responsile for running mchine lnguge progrms : It runs mchine lnguge instructions. Running mchine lnguge instruction is performing simple opertion (commnd) on dt. The memory keeps the progrms nd dt, leding to the storedprogrm concept of tody s computers. I/O controllers interfce the I/O devices to the memory nd CPU. An I/O controller cn control one or more I/O devices. Often the numer of I/O devices connected to n I/O controller depends on the speed of I/O devices. A high speed I/O device cn e controlled y single I/O controller while few slow speed I/O devices cn e controlled y single I/O controller. The stored-progrm concept nd the generic view of computer orgniztion with t lest three digitl systems (the CPU, memory nd I/O controller) re often ttriuted to mthemticin John Von Neumnn. However, there is considerle dete on tht. CPU Memory I/O Controller I/O Controller I/O Controller Disk Disply Keyord A microprocessor contins t the lest the CPU which ws the cse in the 1970s nd erly 1980s. Tody they include cche memories, us interfces, memory mngement units. High-performnce microprocessors from Intel, AMD, Sun, IBM hve these functionl units. Some other chips in the mrket tody contin memory nd even I/O controllers. These re used for emedded pplictions nd clled microcontrollers, not microprocessors. The reson why the memory nd I/O controllers re dded is tht emedded computers re often required to occupy smll spce in the system they re housed in. To reduce the chip count, hence the physicl spce, this pproch is needed. As the ove discussion indictes looking t computer from different points of view cn e t lest distrctive, if not confusing for eginners of computer design : hrdwre vs. softwre, different progrmming lnguges, operting systems, compilers, ssemlers, rchitecture vs. orgniztion, etc. Tht is why the concept of computer lyers is used to give comprehensive view of computers t different complexities or strction. Astrction llows reducing the numer of detils of lyer with simpler view. In the computer lyers figure on the first pge, lyer is strcted y the lyer just ove it. 2. Computer Lyers The Appliction, Computtionl Method, High-Level Lnguge, Operting Systems nd Architecture lyers constitute the softwre lyers. The Architecture, Microrchitecture, Logic nd Trnsistor lyers constitute the hrdwre lyers. Ech lyer, except the Appliction lyer, implements the lyer ove, following the concept of strction Clerly, the Architecture is the hrdwre/softwre interfce. A computer rchitect needs to hndle oth hrdwre nd softwre nd keep trck of dvnces in oth. 2.1. Appliction Lyer : This lyer indictes the set of pplictions intended for the computer! Idelly, ll pplictions cn e run on computer. However, in prctice the computer is designed to efficiently run suset of them. For exmple, computer runs scientific pplictions, different computer runs usiness pplictions, etc. When one designs computer for pecific set of pplictions, he/she needs to mke sure these pplictions re run fst. Tody, in industry, designers use enchmrk suites tht contin specific pplictions. Designers from different compnies use the sme enchmrk suite to compre their computers. Populr enchmrk suites include Linpck, Livermore Loops, Whetstone, Dhrystone, SPEC CPU 2006, SPECWe nd EDN EEMBC (Emedded Microprocessor Benchmrk Consortium enchmrk of five clsses of pplictions). Polytechnic Institute of NYU Pge 4 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

2.2. Computtionl Methods Lyer : This lyer is highly theoreticl nd strct. The computtionl method (i) determines chrcteristics of items (dt nd other) nd work (opertions), ii) descries how opertions initite ech other during execution, i.e. which opertion is followed y which or determining the order of performing opertions, nd (iii) implicitly determines the mount of prllelism mong the opertions. Three types of computtionl methods re frequently covered in the discussion of this topic : control flow, dt flow nd demnd driven. Tody s computers use the control flow computtionl method where the order of opertions is specified y the order of instructions in the progrm. The order implies the execution order nd so next instruction to perform is the one tht follows the current instruction in the progrm. If one wnts to chnge the order of execution, explicit control instructions (rnch, jump, etc.) must e used, hence the nme control flow. This explicit sequence of opertions oscures prllelism. Thus, the control-flow is inherently sequentil, hindering prllelism nd higher speeds. This is the reson why tody s supercomputers re very expensive s they need complex compilers, operting systems, hrdwre nd highly trined prllel lgorithm designers nd progrmmers to extrct prllelism from sequentil progrms. In dt flow, n opertion strts its execution when ll of its opernds re ville. Since the opernd vilility determines the order of opertions, this method is lso clled dt driven. Mny opertions cn hve their opernds redy t the sme nd so they cn strt execution t the sme. Thus, dt flow does not hinder prllelism. In fct, the prllelism is explicit to the fullest extend. In demnd driven, n opertion strts when its result is demnded. Mny opertion results cn e demnded t the sme nd so they cn ll strt execution in prllel. Demnd driven computtion lso hs prllelism explicit. Overll, dt-flow nd demnd driven methods re inherently prllel. However, to implement them in full scle tody is not efficient given the current technology. 2.3. Algorithm Lyer : The lgorithm lyer follows the computtionl method chosen. An lgorithm is strct nd short. It is independent of high-level lnguges. The lgorithm for n ppliction specifies mjor steps to generte the output. It is mechnicl procedure tht consists of finite numer of steps tht generte the output. Ech stte must e precisely stted (definiteness) so tht it cn e crried out (executed, run) y computer (effective computility). An lgorithm must terminte. Tht is, it must crry out finite numer of steps (finiteness). Given n ppliction nd computtionl method, one cn derive different lgorithms, ech with different comintion of speed (the numer of steps performed) nd size (the spce needed which is the numer of steps in the lgorith plus the mount of dt used). Tody, for single-processor (uniprocessor) computer we write sequentil lgorithm in the control-flow method. However, if we hve computer with multiple processors (cores), we write prllel lgorithm ut still use the control flow method. Below, we discuss three common scientific pplictions nd the sequentil lgorithms : Appliction : Dot Product dot = A * B ==> n dot = A[i] B[i] i=1 A, B re vectors with n elements Algorithm : dot = 0 for (1 <= i <= n) do dot = dot + (A[i] * B[i]) Polytechnic Institute of NYU Pge 5 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Appliction : SAXPY/DAXPY, step in Gussin elimintion to solve liner equtions Y = * X + Y ==> Y[i] = X[i] + Y[i] X, Y re vectors with n elements & is sclr Algorithm : for (1 <= i <= n) do Y[i] = * X[i] + Y[i] Appliction : Mtrix Multiply A = B * C ==> n A[i,j] = B[i,k] C[k,j] k=1 A is n m x p mtrix B is n m x n mtrix C is n n x p mtrix i A j m x p = i B m x n * C j n x p A[i,j] is the dot product of row i of B nd column j of C Algorithm : for (1 <= i <= m) do for (1 <= j <= p) do A[i,j] = 0 for (1 <= k <= n) do A[i,j] = A[i,j] + B[i,k] * C[k,j] 2.4. High-Level Lnguge Lyer : The lgorithm developed for n ppliction is coded in high-level lnguge, such s Fortrn, C, C++, Jv, etc. Fortrn is still the choice of scientific computing, while C is gining ground. Note tht given n lgorithm there re different progrms possile ll in the sme high-level lnguge. As in the cse of lgorithms, if computer is multicore computer, the progrmmer must write prllel progrm to tke the dvntge of the computer. 2.5. Operting Systems Lyer : This lyer interfces with hrdwre. Tht is, it hides hrdwre detils from the high-level lnguge progrmmer nd provides, security, stility nd firness in the computing system. Thus, this lyer dds more code to run on the ehlf of the ppliction. The lyer lso hndles interrupts nd input/output opertions. Prllel operting systems re needed for computers tht hve multi-core processors. 2.6. Architecture Lyer : The rchitecture lyer is the hrdwre/softwre interfce. Its elements include the mchine lnguge instruction set, register sets, the memory nd Input/Output structures mong others. We will discuss this level considerly nd so its description here is kept short. Nevertheless, elow we give rief discussion of dilemm tht computer rchitects hd in the 1980s nd erly 1990s. The discussion is given due to its historicl significnce : The dilemm : on the one side, computer rchitect would like to include complex instructions (floting-point divi- Polytechnic Institute of NYU Pge 6 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

sion, string serch, etc.) in the instruction set to perform complex opertions directly. The opposite decision is not to include complex instructions : only simple instructions. A missing complex opertion is implemented y piece of code. Oviously, running code tkes longer time thn running single complex instruction, so the simpler mchine would e slower thn the complex mchine. But, complex rchitecture results in complex hrdwre with higher costs, longer development times nd difficult upgrding. A simple rchitecture leds to simpler hrdwre. But, we need to use sophisticted compiler to generte n efficient code since n ppliction with complex opertions hs to e implemented y pieces of code. One cn see tht the simpler computer is slow computer when those pplictions re run. We must note tht oth decisions re ttrctive : A fst mchine or cheper mchine. The fundmentl question is this : Are those complex opertions needed often? In other words, re those functions executed often? If often, complex rchitecture is justified : we must mke the common cse fst. The division : There hve een two cmps in computer hrdwre tht promote two different computer rchitecture philosophies : A Complex Instruction Set Computer, CISC nd Reduced (simple) Instruction Set Computer, RISC. Exmples of highly CISC microprocessors re the Intel x86 nd Motorol 680X0. Exmples of highly RISC microprocessors re MIPS nd Sun UltrSPARC. The compromise hs een tried in the form of hyrid microprocessors. Highly RISC microprocessors re dded CISC fetures when upgrded. Similrly, highly CISC microprocessors re dded RISC fetures when upgrded. An exmple of hyrid microprocessor is the IBM PowerPC microprocessor, which is nevertheless mrketed s RISC microprocessor. Currently, the RISC ide is the fvorite since it llows efficient pipelining. Even, the Intel x86 rchitecture relies on RISC execution : Ech x86 CISC instruction is converted to up to three RISC opertions in the ID cycle nd these three RISC opertions re executed in the rest of the CPU hrdwre s if they re in the instruction set. To summrize : ) There is lwys speed/cost trde-off where the higher the speed, the higher the cost. ) The design of computer (designing its rchitecture, orgniztion, logic nd chip levels) is sed on the trgeted ppliction set. Tht is, we choose the ppliction set for our computer, then design the hrdwre. c) Mking the common cse fst is n ttrctive design rule. One cn see how computer rchitects would idelly design the rchitecture of computer : A numer of rel, commonly used progrms re run nd lrge set of sttistics is otined, such s how often ech instruction is executed, which registers re used, etc. From the sttistics, we decide out which instructions we hve to include in the instruction set, how mny registers in the register set, types of ddressing modes, types of dt representtions, etc. so tht we mke the common cse fst! We lso determine how time consuming it would e if some opertions were not implemented y instructions (y hrdwre), ut y softwre (y functions). Clerly, the design focuses on ppliction-rchitecture interctions such tht the rchitecture is tuned to the pplictions. Concentrting on one lyer t time (such s the rchitecture lyer) is ttrctive from computer rchitecture eduction point of view. This pproch is not ttrctive prcticlly. Becuse the resulting computer cn violte one or more of the design gols (speed, cost, size, power consumption,...). For exmple, the computer cn e too expensive or too lrge. Thus, in prctice computer designers work on severl levels simultneously, even though they proceed top-down. When n rchitecturl decision is mde, its implictions on lower levels re exmined. If it is concluded tht prticulr rchitecturl decision cn violte gol, it is ndoned. For exmple, if it is decided to llow word oundry crossings, on the rchitecture level, we check its implictions on the orgniztion level. We might relize tht the corresponding hrdwre is unnecessrily too expensive, so we reverse the decision out word oundry crossings. Polytechnic Institute of NYU Pge 7 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

2.7. Microrchitecture Lyer : This lyer consists of digitl systems. A computer which is digitl system consists of t lest three smller digitl systems : the processor (CPU), the memory nd Input/Output controller. A digitl system consists of registers, uses, ALUs, sequencers, etc. Other nmes used for this lyer re orgniztion nd register trnsfer level (RTL). We will discuss the microrchitecture lyer in depth nd so its coverge is short in this section. 2.8. Logic Lyer : This lyer consists of digitl circuits. Digitl circuits form digitl systems of the microrchitecture level. Digitl circuits use two types of components : gtes nd flip-flops. A gte outputs 1 or 0, depending on its current input vlues, i.e. the output now is function of the inputs now. Most common gtes used re AND, OR, NOT, NAND nd NOR gtes. A flip-flip stores single it. To store the it (1 or 0), clock signl is used. The rising or flling edge of the clock stores the it. Most common flip-flops used re D nd JK flip-flops. A flip-flop is implemented y using few gtes. Then, we cn stte tht ll digitl circuits consist of gtes! Note tht flip-flop is not memory. The memory chip design is different from the flip-flop design. There re two types of digitl circuits. A comintionl circuit contins gtes. A comintionl circuit chnges its output right fter n input is chnged : the output now is function of the inputs now. Comintionl circuits cnnot store informtion. Exmples of comintionl circuits re dders, multipliers, comprtors, etc. Sequentil circuits contin gtes nd flip-flops. They store pst inputs : the output now is function of inputs now nd pst inputs. Exmples of sequentil circuits re counters, registers, shift registers, sequencers. We will cover the Logic lyer in detil. Nevertheless, elow, we give rief introduction to digitl logic. 2.8.1. Introduction to Digitl Logic In this section, we present forml digitl circuit fundmentls needed to implement such structures s registers, uses, ALUs nd sequencers. Digitl circuits consist of gtes nd flip-flops. There re two types of digitl circuits : Comintionl circuits nd sequentil circuits. Comintionl circuits use only gtes while sequentil circuits use oth gtes nd flip-flops. Most rel life circuits re sequentil circuits. Comintionl circuits re more specific purpose. The input-output reltionship of digitl circuit is importnt when it is studied. The input-output reltionship trets the digitl circuit s lck ox with inputs nd outputs. It reltes the output to the inputs : every output is descried s function of the inputs. A function is mthemticl entity tht precisely descries how n output is determined y its inputs. For exmple, in the figure elow, the digitl circuit shown s lck ox hs three inputs nd two outputs (two digitl functions) : c Digitl Circuit y = f 1 (,, c) z = f 2 (,, c) Functions f 1 nd f 2 precisely indicte when outputs y nd z re 1 nd when they re 0. Tht is, f 1 specifies the inputoutput reltionship of y to inputs, nd c nd f 2 specifies the input-output reltionship of z to inputs, nd c. A function for simple comintionl circuit is represented y expressions, minterm lists, truth tles, etc. For complex comintionl circuits, we do not use functions, ut opertion tles. A function for simple sequentil circuit is represented y expressions, stte tles nd stte digrms. For complex sequentil circuits we do not use functions, ut opertion digrms. Due to historicl resons, digitl circuits re clled switching circuits, digitl circuit functions re clled switching functions nd the lger to design comintionl circuits is clled Switching Alger. Below, we first discuss comintionl circuits nd then sequentil circuits. Polytechnic Institute of NYU Pge 8 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

2.8.2. Comintionl Circuits A comintionl circuit chnges its output when its inputs re chnged. This is ecuse it uses gtes which chnge their outputs when their inputs re chnged. The time it tkes for the gte output to chnge is severl hundred pico seconds tody. Therefore, comintionl circuit will chnge its output in time durtion in terms of nno seconds. There is no time dimension in comintionl circuits, since they chnge their outputs if inputs re chnged. A numer of comintionl circuits re frequently used in digitl systems : Multiplexers, decoders, encoders, demultiplexers, dders, comprtors nd prity checkers. Their design hs een studied extensively in literture. Students cn tke look t ooks on digitl logic to understnd their opertion. Typicl comintionl design follows the rule of thum tht if it hs 4 inputs or less it is immeditely designed y using Switching Alger. Otherwise, the circuit is prtitioned into smller nd smller locks until they hve 4 inputs or less or until they re populr digitl circuits nd there re ville designs for them. 2.8.2.1. Switching Alger George Boole, in 1854, introduced systemtic tretment of logic nd developed for this purpose n lgeric system now clled Boolen Alger. This lgeric system, (S ; + ;. ; - ; 0 ; 1), consists of set S of elements, inry opertions + (clled plus ) nd. (clled dot ) nd unry opertion - (clled complement ) nd t lest two elements 0 nd 1 in the set S. E. V. Huntington in 1904 defined mny-vlued Boolen Alger with six postultes (xioms). Below, we list those tht re relevnt to our discussions : PI) k + 0 = k 0 is the identity element with respect to + PI) k. 1 = k 1 is the identity element with respect to. PII) k + m = m + k PII) k. m = m. k + is commuttive. is commuttive PIII) k + (m. p) = (k + m). (k + p) + is distriutive over. PIII) k. (m + p) = (k. m) + (k. p). is distriutive over + PIV) For every element k of set S, there exists n element k (complement of k) of set S, such tht k + (k) = 1 PIV) For every element k of set S, there exists n element k (complement of k) of set S, such tht k. (k) = 0 Aove, we do not mention the numer of elements in S nd how the opertors mnipulte the elements of S. In fct, one cn formulte mny Boolen Algers depending on the numer of elements in S nd opertor definitions. Among ll Boolen Algers, the two-vlued (two-element) Boolen Alger lso known s Switching Alger is the most widely known nd studied. Switching Alger ws developed y C. Shnnon in 1938. Switching Alger is defined y Algeric system (0 ; 1 ; + ;. ; -) nd The six postultes ove under the condition tht The following opertor rules on (0 ; 1) pply, i.e. opertor definitions : k m k.m 0 0 0 0 1 0 1 0 0 1 1 1 Definition of the AND opertor k m k+m 0 0 0 0 1 1 1 0 1 1 1 1 Definition of the OR opertor k k 0 1 1 0 Definition of the Complement (NOT, Invert) opertor Tody, ech opertor ove is directly implemented y electronic circuits. Tht is, digitl electronic circuit with trnsistors, cpcitors, resistors nd other electronic components is designed to perform the specific logic opertion. Ech such digitl electronic circuit for n opertor is clled gte. The figure elow shows the schemtic symols of gtes for the three opertors ove : Polytechnic Institute of NYU Pge 9 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

m k k. m m k k + m k k 2-input AND Gte 2-input OR Gte NOT Gte AND nd OR gtes cn hve ny numer of inputs, s long s there re t lest two inputs. However, NOT gte lwys hs single input. 3-input AND Gte m p k k. m. p 3-input OR Gte m p k k + m + p There re other opertors, such s NAND, NOR, EXOR nd EXNOR, tht re implemented y gtes. Tody, on chip electronic circuits with trnsistors implement severl gtes to millions of gtes. There re chips, such s microprocessor chips, with hundreds of millions of trnsistors. Chip densities hve incresed t the rte of Moore s Lw since 1960s : The numer of trnsistors on chip doules every two yers. Since 1938, theorems hve een developed in Switching Alger. These theorems follow nd stisfy the postultes nd opertor definitions given ove : TI) Idempotency : ) k + k = k ) k. k = k TII) Null elements ) k + 1 = 1 ) k. 0 = 0 TIII) Asorption : ) k + (k. m) = k ) k. (k + m) = k TIV) Involution : TV) Associtivity : ) k + (m + p) = (k + m) + p = k + m + p ) k. (m. p) = (k. m). p = k. m. p TVI) ) k + ((k).m) = k + m ) k. ((k) + m) = k. m ((k)) = k TVII) DeMorgn s theorems : ) (k + m) = k. m ) (k. m) = k + m The AND opertor symol : In order to reduce the numer of symols in expressions, we will not show the. symol etween vriles. We will imply there is n AND opertion etween them :..c = c The truth tle for comintionl circuit shows the output vlue for every input comintion. For exmple, for the 4-input imginry circuit elow, the following truth tle reltes the output to the inputs. The truth tle descries the function, the input/output reltionship : Polytechnic Institute of NYU Pge 10 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

A B C D f(a, B, C, D) A B C D Digitl Circuit f(a,b,c,d) Truth Tle : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 Trditionlly, the OR opertion is clled sum nd the AND opertion is clled product since the OR performs similr to the sum opertion nd the AND performs similr to multiply opertion. One cn develop n expression tht focuses on the 1s of the output such tht it would hve s mny terms s there re 1s on the output. Ech term is n input comintion tht genertes 1 for the output. Such n expression would hve product terms summed nd is clled the cnonicl SOP expression. Ech product term is cnonicl product term nd hs ll the inputs of the functions. For exmple, on the ove truth tle, there re seven 1s, therefore the cnonicl SOP expression would hve seven cnonicl product terms. The cnonicl SOP expression of the ove truth tle is the following : f(a, B, C, D) = A B C D + A B C D + A B C D + A B C D + A B C D + A B C D + A B C D The expression implements the function ecuse when cnonicl product genertes 1, the whole expression ecomes 1 : 1 OR ny term is 1. All we hve to do is to show tht ech cnonicl product term corresponds to n input comintion. Here, we show the correspondence etween the first cnonicl product term nd the top input comintion tht genertes 1 on the truth tle. The first term which is A B C D genertes 1, if A, B, C nd D re ll 1s. In order for this to hppen, A, B nd C must zero nd D must e 1 so tht 0 AND 0 AND 0 AND 1 is 1 AND 1 AND 1 AND 1 which is 1. We see tht if n input is complemented, we need to hve the input s 0, otherwise 1 to determine the input comintion. Then, the input comintion tht correspondents to A B C D is 1 since it is 0001 on the truth tle. The other cnonicl product terms re for the remining six input comintions 3, 5, 7, 9, 13 nd 15 : A B C D 0 0 0 1 1 A B C D 0 0 1 1 3 A B C D 0 1 0 1 5 A B C D 0 1 1 1 7 A B C D 1 0 0 1 9 A B C D 1 1 0 1 13 A B C D 1 1 1 1 15 If we hve n expression for function, we cn use Switching Alger to simplify it. Below, we give the simplifiction of complex expression tht descries function : f(a, B, C, D) = D(AB + C) + ABCD + A B D A nonminiml expression for function f = ABD + CD + AD(B + BC) k(m + s) = km + ks = ABD + CD + AD(B + C) k + km = k + m = ABD + CD + A B D + ACD k(m + s) = km + ks = ABD + A B D + D(C + CA) k(m + s) = km + ks = ABD + A B D + CD + AD k + km = k + m & k(m + s) = km + ks = ABD + CD + AD(1 + B) k(m + s) = km + ks = ABD + CD + AD k + 1 = 1 & k1 = k = CD + D(A + AB) k(m + s) = km + ks = CD + AD + BD k + km = k + m & k(m + s) = km + ks expressions for function f Miniml SOP expression Polytechnic Institute of NYU Pge 11 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Once we hve the miniml SOP expression, we drw the comintionl circuit (the gte network) s the lst step. The resulting gte network is wht we cll the 2-level AND-OR gte network. Below, we show the miniml 2-level AND-OR gte network for the ove function : D D D C A B 2-level miniml AND-OR gte network f(a, B, C, D) =AD + BD + CD Note tht the ove circuit hs three levels : level of inverters, level of AND gtes nd level of the OR gte. However,, these gte networks re still clled 2-level AND-OR gte networks nd we will keep tht nme. The reson why we try to otin 2-level AND-OR gte networks is tht they re the fstest possile we cn hve s explined elow. Note tht 1-level gte network hs only one gte which cnnot e useful for rel-life pplictions. 2.8.2.2. Comintionl Circuit Speed Gtes in comintionl circuits output vlues sed on the inputs. If n input is chnged, the output chnges fter short dely which is clled propgtion (gte) dely, t p. This dely is function of electicl properties of the gte, including the process of the gte. We descrie the process in the Trnsistor Lyer section. Tody, this dely is few nnoseconds or less. Note tht when we mention the speed of gte, we men its gte dely : the shorter the dely, the fster the gte is. y The gte dely from the input to the output : t p y t p The gte dely is one of the three fctors tht determine the speed of comintion circuit. The other two fctors re the longest pth from n input to the output, i.e. the numer of gte levels nd wire delys etwen the gtes. 2-level gte networks hve the shortest pth from n input to the output nd so they re the fstest circuits. Wire delys lso contriute to the speed since it tkes some time for signls to trvel from one gte to nother. Although, 2-level gte networks re stisfctory for high speed, they result in expensive circuits. 2.8.2.3. Exmples of Comintionl Circuit Design Below, we give exmples of designing simple comintionl circuits, lgeric simplifictions nd Krnugh mp simplifiction so tht students understnd the purpose of the postultes nd theorems. A 2-to-4 Binry Decoder The most common decoder is the inry decoder which hs k dt inputs nd 2 k outputs. If thedecoder is 2-to-4 decoder, then k is 2 nd so there re 2 2 = 4 outputs. The k inputs represent n unsigned inry numer. The outputs decode the unsigned numer represented y the k inputs.for exmple if the inputs represent (3) 10, Output line 3 is 1 nd the other outputs lines re 0. Below the development of the 2-to-4 decoder is shown. Polytechnic Institute of NYU Pge 12 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

I0 I1 2-to-4 Decoder Inputs (I1, I0) represent n unsigned inry numer Y0 Y1 Y2 Y3 I1 I0 Y3 Y2 Y1 Y0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 Y0 = I1 I0 Y1 = I1 I0 Y2 = I1 I0 Y3 = I1 I0 I0 I1 I0 I1 I1 I1 I1 I1 I0 I0 I0 I0 Y0 Y1 Y2 Y3 The decoder outputs require t most two gte levels to generte the outputs. Therefore, the decoder is fst. Note tht there re 3-to-8, 4-to-16, etc. decoders whose opertion nd implementtion follow similrly. We will use 4-to-16 decoder when we implement hrdwiring lter in the semester. Tody s memory chips (DRAM, SRAM, ROM, etc.) hve lrge inry decoders. For k-input decoder, there re 2 k AND gtes. Ech input is connected to hlf the numer of AND gtes. For smll size decoders, this is not mjor prolem. But, for lrge decoders, it is prolem which is clled fn-out. The fn-out of line is numer which indictes how mny inputs cn e connected to it. If the numer is exceeded, electriclly, there re prolems nd so the circuit my not work. It is ecuse of this reson tht the decoders of memory chips hve their gte networks with more thn two levels to reduce the fn-out requirement. However, with more levels, the decoder is slower, therefore, the memory chip is slower. c A (1-it) 2-to-1 Multiplexer A 1-it 2-to-1 Multiplexer (MUX) is selector which selects one of the two inputs sed on select signl. As seen elow, it hs three inputs nd one output. Two inputs ( nd c) re dt inputs one of which is output. The third input () is the control input, the select input. The single output is lwys equl to either or c t ny time. The MUX is 1-it MUX since when n input is selected, there is only one dt line selected. As the gte network shows, the MUX hs three gte delys. 2 3 5 7 1-it 2-to- 1 MUX If = 0 then y = = 1 then y = c y(,, c) 0 1 2 3 4 5 6 7 c y(,, c) 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 Output y is 1 when the input comintion is 2 or 3 or 5 or 7 y(,, c) = (c + c) + c ( + ) = 1 + c 1 k + k = 1 = + c k1 = k 0 1 0 0 1 1 1 0 1 1 1 1 c + c + c + c c k(m+p) = km + kp y One cn develop 2-it 2-to-1 MUXes, 4-it 2-to-1 MUXes, etc, y using numer of 1-it 2-to-1 MUXes s descried in the next section. Note lso tht there re k-it 4-to-1 MUXes, k-it 8-to-1 MUXes, etc. A 4-it 2-to-1 Multiplexer A 4-it 2-to-1 MUX hs two sets of dt inputs. Ech set of dt inputs hs four its. Thus, the MUX hs four outputs crrying the vlues of the four input lines selected. The MUX hs 9 inputs nd 4 outputs. The single input is nd the 4-it inputs re K nd M. The 4-it otput is Y. It outputs K if is 0 nd outputs M if is 1. The lck ox view nd implementtion of the 4-it 2-to-1 MUX is given elow. Polytechnic Institute of NYU Pge 13 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

K M 4 (K3,K2,K1,K0) 4 (M3,M2,M1,M0) 4-it 2-to-1 MUX 4 Y (Y3,Y2,Y1,Y0) Since there re 9 inputs, we need to prtition it into simpler pieces! We hve to otin the opertion tle of the 4-it 2-to-1 MUX : Opertion 0 Y = K 1 Y = M The mjor opertions re not cler on this opertion tle. We need to get different, more detiled opertion tle : Opertion 0 Y3 = K3 ; Y2 = K2 ; Y1 = K1 ; Y0 = K0 1 Y3 = M3 ; Y2 = M2 ; Y1 = M1 ; Y0 = M0 If = 0 then Y = K else if = 1 then Y = M There re four identicl mjor opertions : 1-it 2-to-1 MUXing! We prtition the 4-it 2-to-1 MUX into four locks. Ech lock is 1-it 2-to-1 MUX which we hve designed y using Switching Alger : It hs three inputs nd one output : K3 M3 K2 M2 K1 M1 K0 M0 1-it 2-to-1 MUX 1-it 2-to-1 MUX 1-it 2-to-1 MUX 1-it 2-to-1 MUX Y3 Y2 Y1 Y0 The 4-it 2-to-1 MUX is then s follows : K3 M3 K2 M2 Y3 Y2 K1 M1 K0 M0 Y1 Y0 A 1-it Adder, Full Adder A 1-it dder, Full ADDer (FA) dds two 1-it numers plus crry input. Therefore, it dds three its. It hs 3 inputs nd 2 outputs s shown elow. c F A c out (,, c) sum(,, c) The 1-it ADDer : c + sum c out We otin the truth tle from which we otin the cnonicl SOP expressions : Polytechnic Institute of NYU Pge 14 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

0 1 2 3 4 5 6 7 c c out (,, c) sum(,, c) 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Output c out is 1 when the input comintion is 3 or 5 or 6 or 7 Output sum is 1 when the input comintion is 1 or 2 or 4 or 7 3 5 6 7 0 1 1 1 0 1 1 1 0 1 1 1 c + c + c + c 1 2 4 7 0 0 1 0 1 0 1 0 0 1 1 1 c + c + c + c The sum(,, c) expression is lso the miniml expression. It cnnot e simplified : sum(,, c) c out (,, c, d) = c + c + c + c = c + c + c + c = c( + ) + c + c k(m+p) = km +kp = c + c + c k+k = 1 & k1 = k = c( + ) + c k(m+p) = km +kp = c( + ) + c k + km = k + m = c + c + c k(m+p) = km +kp = (c + c) + c k(m+p) = km +kp = (c + ) + c k + km = k + m = c + + c k(m+p) = km +kp The 2-level AND-OR gte networks re s follows : c c c c c c c 3 gte delys sum(,, c) = c + c + c + c c c c c 2 gte delys c out (,, c) = c + + c c c Therefore, Full Adder tkes 3 gte delys to generte the sum output (sum(,, c)) nd two gte delys to generte the crry out (c out (,, c)). A 32-it Ripple-Crry Adder An importnt component of digitl system is the ALU which hs n dder, multiplier, AND, OR nd other functionl units. The dder is the most criticl one since its speed prtly determines the clock frequency of the digitl system. CPUs hve typiclly 32-it dder which hs to complete its opertion in one or few clock periods. Thus, high-speed dder hs to e designed. Polytechnic Institute of NYU Pge 15 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

c out + K M R K, M nd R 32-it 2 s Complement Binry numers c in c out K 32 32-it Adder 32 32 M c in + c out c32 K31 K30 K29 K28... K2 K1 K0 M31 M30 M29 M28... M2 M1 M0 R31 R30 R29 R28... R2 R1 R0 c in c 0 R A 32-it dder dds two 32-it numers plus crry input. It hs 65 inputs nd 33 outputs. Our strting point to design high-speed dder is 32-it Ripple-Crry Adder which is the slowest tht cn e designed. The Ripple- Crry Adder hs 32 1-it dders, known s Full Adders : K31 M31 K30 M30 K29 M29 K2 M2 K1 M1 K0 M0 c out c 32 FA c 31 FA FA... FA FA c 30 c 3 c 2 FA c 1 c in c 0 R31 R30 R29 R2 R1 R0 Ech one of our 32 Full Adders hve the ove two gte networks. By using these two gte delys, we cn otin the worst cse ddition time for our 32-it Ripple-Crry Adder : K31 M31 K30 M30 K29 M29 K2 M2 K1 M1 K0 M0 64 c out c 31 FA 62 c 31 FA 60 6 4 FA... FA FA c 30 c 3 c 2 2 0 FA c 1 c in c 0 R31 65 R30 63 R29 61 R2 7 R1 5 R0 3 Our 32-it Ripple-Crry Adder tkes 65 gte delys to clculte the sum. If gte dely is 1ns, the ddition time is 65ns. This is very long for tody s stndrds. We need to improve the timing. We will do tht y designing 32-it Crry-Lookhed Adder in clss. 2.8.3. Sequentil Circuits A sequentil circuit consists of flip-flops nd gtes. It hs flip-flops to store its, mening pst inputs. Therefore, sequentil circuit output depends on the present inputs nd lso pst inputs. This mens sequentil circuits hve the time dimension. A flip-flop opertes different from gte such tht it stores it, if it receives n edge on the clock signl. The edge is either high-to-low trnsition of the signl (negtive edge) or the low-to-high trnsition of the signl (positive edge). A flip uses only one type of these two edges. For exmple, if flip-flop stores when it receives negtive edge, then we sy it is negtive-edge triggered. There re severl types of flip-flops. One tht is used to implement registers is the Dt (D) flip-flop. Another frequently used flip-flop is the J-K flipflop used to implement counters. A D flip-flop hs single dt input which is Polytechnic Institute of NYU Pge 16 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

stored when the clock edge is received. A J-K flip-flop hs two inputs nd is stored it when clock edge is received. The timing of the edge is controlled y Store control signl generted y the control unit. 2.8.3.1. Flip-Flops The flip-flop hs two outputs : Q, nd Q s shown elow. The CE or clock enle input enles/disles the clock input, the C input. Tht is, if CE is 0, the clock input cnnot e used. The clock or C input indictes when to store on the flip-flop. The tringle symol next to the clock input indictes it is n edge triggered clock input. The high-tolow trnsition symol indictes the flip-flop is stored when there is negtive edge. Wht is stored on the flip-flop depends on the dt input, D. According to the opertion tle elow, when CE is 1 nd there is negtive edge on the clock input nd the D input is 0, we store 0 fter the negtive edge, typiclly, few nno seconds fter the negtive edge. If the D input is 1 t the edge, we store 1 fter the edge. Finlly, the lst two rows, indicte when the D input is ignored. Tht is, when the flip-flop is not stored. When CE is 0 or when there is no negtive edge, the D input is ignored. This is lso known s Don t Cre nd is shown y n X symol. D CE C Q Q D CE C Opertion 0 1 Store 0 fter the negtive edge 1 1 Store 1 fter the negtive edge X 0 X Not Stored X 1 0 Not Stored 2.8.3.2. Registers A register is sequentil circuit used to store dt temporrily. It is stored dt y pplying clock edge t the end of the clock period it needs to e stored. Note the register which is stored vlue in prticulr clock period ctully gets the vlue in the eginning of the following clock period. The exmple elow shows n imginry 32-it register nmed A which is stored vlue when its Store A signl is 1 in clock periods 2 nd 5. The D flip-flops of the register receive the edge t the end of clock periods 2 nd 5. OBUS 32 32 D Q Store A Clock CE C A clock period 1 clock period 2 clock period 3 clock period 4 clock period 5 clock period 6 Clock Store A negtive edge negtive edge OBUS A We study how the D flip-flop ove cn e used to store its y using one of the its of OBUS shown for the register exmple ove. The rightmost flip-flop A[0] nd how it is stored re shown elow. Note tht we do not store every clock period, ut when it is necessry y rising CE to 1 in prticulr clock period. Polytechnic Institute of NYU Pge 17 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Clock Result Store A D CE C Q Q A[0] clock period 1 clock period 2 clock period 3 clock period 4 clock period 5 clock period 6 Clock Store A negtive edge negtive edge OBUS[0] A[0] The D input is used only when there is negtive edge on the clock input. Therefore, D line chnges do not ffect the output ll the time which is unlike the gte opertion. Note tht on the timing digrm, it looks like output A[0] chnges t the sme time the negtive edge occurs. Actully, it chnges few nno seconds fter the negtive edge. Similrly, if the input seems to chnge t the sme time there is negtive edge, the vlue stored is the vlue right efore the negtive edge. For exmple, if OBUS[0] chnged from 1 to 0 t the end of clock period 2, the vlue tht is stored is the vlue right efore the edge which is 1. 2.9. Trnsistor Lyer : This lyer consist of digitl electronic circuits. Digitl electronic circuits re used to uild digitl circuits. Tht is, digitl electronic circuits implement gtes (lso flip-flops). Digitl electronic circuits consist of trnsistors, resistors, cpcitors, diodes, etc. Trnsistors re the min component nd so this level is often clled the trnsistor level. Trnsistors in these circuits re used s on-off switches. The switches re turned on nd off y control inputs. The figure elow shows on-off switches nd how these switches re used to implement n AND gte s n exmple. 0 1 A switch is device with two conditions : 1 0 1 1 Open when the control input is 0 Closed when the control input is 1 m k AND k.m k m 1 k.m AND gte The figure elow gives n exmple of gte implementtion where the gte is 2-input TTL NAND gte. The implementtion of the NAND gte y trnsistors, resistors, etc. is shown next to the gte. TTL is chip technology nd is descried elow. Tody, digitl electronic circuits, i.e. circuits with trnsistors, resistors, cpcitors, etc., re Polytechnic Institute of NYU Pge 18 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

on chips. Tht is, gtes nd flip-flops re on chips. A TTL NAND gte implementtion y On Semiconductor B A NAND A.B NAND gte Trnsistor We use semiconductor sustnces to implement trnsistors. Tht is, tody s chips re semiconductor chips. Silicon nd Gllium Arsenide re exmples of semiconductor sustnces. Ech sustnce hs its own speed, cost, power consumption figures. The most common sustnce is Silicon which is found in se snd. This is why Silicon chip prices re so low. Chip design is constrined y design gols : speed, cost, power consumption, size, weight, reliility, etc. Before the design is strted, we determine these constrints nd then design the product. We try not to exceed the constrints, y using the right numer of gtes nd flip-flops nd right digitl electronic implementtions. However, it is not esy to stisfy them s they conflict with ech other. For exmple, the higher the speed, the higher the cost nd power consumption. Hence, study of spectrum of choices from semiconductor sustnces to chip densities is needed. The figure elow shows the spectrum of sustnces nd their reltive speed for tody s digitl electronic circuits. Comprison of commonly used sustnces nd digitl electronic circuits for chips with respect to chip density Silicon Silicon Germnium Gllium (SiGe) Arsenide Nioium (Superconducting) (Not semiconductor) Sustnce used Unipolr Bipolr Trnsistor type CMOS BiCMOS TTL ECL Trnsistor circuit SSI MSI LSI VLSI ULSI LSI VLSI ULSI SSI MSI LSI SSI MSI LSI SSI MSI LSI fster Numer of gtes on the chip Unipolr/ipolr trnsistors nd other electronic components (resistors, cpcitors, diodes,...) re used to implement trnsistor circuits, such s CMOS, TTL, ECL nd BiCMOS. By using trnsistor circuit, we implement single gte. For exmple, CMOS AND gte, TTL AND gte, etc. The reson for using resistors, cpcitors, diodes esides trnsistors for gte is first for the correct usge of trnsistors nd second to mintin the signl integrity, hence opertionl stility of the gte. The numer of electronic components on chip depends on the intended functionlity : the more functionlity, the more components. A widely used clssifiction of integrtion of components on chips is given on Tle 1 elow. Polytechnic Institute of NYU Pge 19 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

The erliest chips from the 1960s were SSI chips nd some of them re still used tody. The current stte of the rt microprocessors hve more thn 500 million components. The integrtion level for these high-density chips is eyond ULSI ut no new nme is greed upon it yet. Tle 1: Chip densities for vrious scles of integrtion Scles of Integrtion (chip density) Siewiorek et l (1982) Burger et l (1982) Smll Scle Integrtion (SSI) < 10 gtes < 64 components Medium Scle Integrtion (MSI) < 100 gtes < 2K components Lrge Scle Integrtion (LSI) < 10,000 gtes < 64K components Very LrgeScle Integrtion (VLSI) < 100,000 gtes < 2M components Ultr Lrge Scle Integrtion (ULSI) > 100,000 gtes > 2M components Silicon is the most commonly used sustnce nd used y high-speed microprocessors nd high-density memory chips. Silicon is expected to e round 10 to 15 yers into the future t lest. Tle 2 elow presents the stte of the silicon technology. Silicon trnsistor circuits (CMOS, TTL,...) hve different speed, cost, power consumption figures. A rief description of TTL nd CMOS circuits is given elow. TTL circuits re used for high-speed, low-cost pplictions while CMOS is for high-density chips, such s microprocessors nd memories (DRAM, SRAM). CMOS circuits re lso used for portle pplictions tht require low-power consumption (spce, emedded pplictions). Tle 3 compres three most commonly used trnsistor circuits. CMOS is the preferred trnsistor circuit to implement microprocessors nd high-density memory chips. This is ecuse CMOS circuits consume the lest mount of power mong the three. TTL is the most widely ville nd chepest one, while ECL is the fstest one. Finlly, we list numer of properties of TTL nd CMOS technologies elow. Tle 2: The stte of the silicon technology Chrcteristic Densest chip trnsistor circuit CMOS Silicon Trnsistors/chip (density) 1,500,000,000 Gte dely Process 50-500 ps 32 nnometer Tle 3: Summry of chrcteristics for three commonly used IC logic fmilies Prmeter TTL CMOS ECL Switching speed (gte dely) Medium Low High Power consumption Medium Low High Chip density Medium High Low Cost Low Medium High Polytechnic Institute of NYU Pge 20 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Trnsistor-Trnsistor Logic (TTL) fetures TTL fmilies : 7400 series : 74 (Stndrd), 74H (High speed), 74L (Low-power), 74S (Schottky), 74LS (Low-power Schottky), 74AS (Advnced Schottky), 74ALS (Advnced Low-power Schottky), 74F (Fst) Unused gte inputs : cn e left unconnected (floting), ut should e tied to used input to e sfe. Also, cn connect to 1 or 0 depending on the input chrcteristic, vi pull-up resistor or pull-down resistor, respectively Gte output circuits : Totem-pole (do not short circuit gte outputs) Tri-stte (gte outputs cn e short circuited if only one gte is enled) Open-collector (n externl pull-up resistor needed. Gte outputs cn e short circuited) Complementry Metl Oxide Semiconductor (CMOS) fetures CMOS fmilies : 4000 series ; 7400 series : 74HC (High-speed CMOS), 74HCT (High-speed CMOS, TTL comptile), 74AC (Advnced CMOS), 74ACT (Advnced CMOS, TTL comptile), 74FCT (Fst CMOS, TTL comptile), 74FCT-T(Fst CMOS, TTL comptile with TTL V OH ) Unused gte inputs : do not leve them unconnected (floting). Tie them to used input. Also, cn connect to 1 or 0 depending on the input chrcteristic, vi pull-up resistor or pull-down resistor, respectively Gte output circuits : Regulr (do not short circuit gte outputs) Tri-stte (gte outputs cn e short circuited, if only one gte is enled) Open-drin (n externl pull-up resistor needed. Gte outputs cn e short circuited) Electrosttic dischrge cn dmge CMOS chips. Unless properly grounded, one should not touch CMOS chips 3. The Big Picture : Trnsistors to Computers Tody, digitl electronic circuits (trnsistor circuits) re on chips. Tht is, those trnsistors, resistors, cpcitors, etc. re on chips. Chips re on printed circuit ords (PCBs) lso known s crds. A PCB cn contin tens of chips. The min PCB of computer is clled motherord which contins the microprocessor nd the memory chips. Typiclly, how mny PCBs non-emedded computer cn hve in single cinet depends on the size of the PCB together with the power nd cooling rrngements of the cinet nd the room the cinet is in. For exmple, desktop computer cn hve two to six PCBs. A chip A PCB Trnsistors nd electronic components re plced in the center of the chip the re clled die. Tht is, the digitl circuits implemented y trnsistors re on the die. Pins (terminls) of the chip llow the components on the die to e ccessile from the externl world. The die is connected to the pins y mens of wires. Dice re plced on wfer. The numer of dice per wfer depends on sizes of the wfer nd die. The size of the die depends on the complexity (functionlity) of the digitl circuit! Polytechnic Institute of NYU Pge 21 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Die Trnsistors re on the die The UC Berkeley VIRAM1 die Photos y Joseph Geis : The Berkeley Intelligent RAM (IRAM) Project : VIRAM1 silicon dice re on the silicon wfer. The wfer contins 72 VIRAM1 dice http://irm.cs.erkeley.edu Just s chip design is constrined y the speed, cost, power consumption, size, weight, reliility, etc., the PCB design is lso constrined y the sme fctors. Before the PCB design is strted, we determine these constrints! Bsed on them, we go hed nd design the PCB. We keep speed, cost, power consumption, size, weight, etc. of the PCB in mind, y using the right numer of chips, chip implementtions nd wiring. The hierrchy of circuits from chips to the whole system is exemplified y the world s fstest supercomputer, the IBM Blue Gene/L, elow. Copyright IBM IBM Deep Computing Permission to use y IBM A microprocessor chip tody contins severl processors (cores, centrl processing units, CPUs), cche memories, memory mngement units (MMUs) nd us interfces. These re implemented y registers, uses, rithmetic-logic units (ALUs), sequencers nd other digitl circuits. All of these digitl circuits re implemented y gtes nd flipflops. Finlly, ll the gtes nd flip-flops re implemented y trnsistor circuits which re on single die. Therefore, trnsistor circuits on die implement microprocessor. Of course, trnsistor circuits lso implement medi processors, emedded processors, I/O processors, digitl signl processors, memory controllers, the memory, etc. Below the Intel Pentium 4 die is shown. Polytechnic Institute of NYU Pge 22 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Intel Pentium 4 Processor Die on 0.18- micron : http://www.intel.com Computer Orgniztion nd Design The Hrdwre/Softwre Interfce, Dvid A. Ptterson nd John L. Hennessy, 3 rd edition, Morgn Kufmn, 2005, pp. 21. We pck hundreds of millions of trnsistors on chip tody. The numer of trnsistors plced on chip follows Moore s Lw tht sttes tht every two yers the numer of trnsistors on chip doules. This is due to our ility to shrink the size of the trnsistor nd control its power consumption. Wht we cll the process on Tle 2 ove is mesure to figure out the size of trnsistor on chip. The process is 32 nnometer tody nd is reduced y one-third every two yers, leding the Moore s Lw curve nd the shrinking trnsistor size. Currently, we hve chips with more thn one illion trnsistors. The Intel dul-core Itnium chip is the densest microprocessor chip with 1.72 illion trnsistors.below, typicl MIPS-sed microprocessor orgniztion, the MIPS R10000 die is shown. The generl computer orgniztion tht is lso implemented on the MIPS : L1 Ins Cche MMU1 L2 Cche CPU L1 Dt Cche MMU2 MIPS Bus Interfce One or more uses (System us,..) One or more uses (I/O uses) Disk DMA1 I/O Controller(s) Printer Terminl virtul memory nd file storge Memory Controller(s) DMA2 Disk Physicl Memory The MIPS R10000 die photo http://wrc.eecs.erkeley.edu/cic/die_photos/r10k.gif Polytechnic Institute of NYU Pge 23 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013

Recently, power consumption hs ecome mjor concern. This is ecuse with so mny trnsistors on the chip, the power consumption ecomes lrge. Power is lso relted to the clock frequency : the higher the clock frequency, the higher the power consumption. When the power consumption is high, the temperture of the chip increses. If hot chip is not cooled quickly, it will urn out. Thus, one hs to use het sinks, fns or liquids to cool the chip. However, cooling dds to the size, weight nd cost of the chip nd the PCB. The recent shift in microprocessor design from one processor (core) to multiple processors (cores) on the chip is due to the incresed power consumption. Simply put, engineers cnnot keep the microprocessor chip t low tempertures with simple cooling techniques when they incresed the clock frequency. They hd to lower the clock frequency. But, tht incresed the execution time (CPUtime), mening slower speeds. The solution to keep the execution time low ws y using multiple processors. All the processors execute instructions of the sme ppliction, performing more opertions per clock period, compensting for the reduced clock rte. Note tht multi-core microprocessor is not uniprocessor. It is prllel processing system! A multi-core chip requires new CPUtime eqution. It cnnot use the one given in the textook. Wht cn it e? Below, two multi-core dice re shown. One is the 9-core IBM Cell die nd the other is the 2-core Intel Itnium-2 with 1.72 illion trnsistors. Polytechnic Institute of NYU Pge 24 of 24 CS/EE1012 Hndout No : 3 Ferury 1, 2013