EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

Similar documents
Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

EECS150 - Digital Design Lecture 2 - CMOS

ASIC = Application specific integrated circuit

Pipeline design. Mehran Rezaei

Digital Design and Computer Architecture

EECS150 - Digital Design Lecture 3 - Timing

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

Fundamentals of Computer Systems

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

EECS150 - Digital Design Lecture 3 - Timing

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

Multiplexor (aka MUX) An example, yet VERY useful circuit!

11. Sequential Elements

Register Transfer Level (RTL) Design Cont.

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

First Name Last Name November 10, 2009 CS-343 Exam 2

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Modeling Digital Systems with Verilog

Instruction Level Parallelism

Logic Design. Flip Flops, Registers and Counters

CS 61C: Great Ideas in Computer Architecture

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

Chapter 2. Digital Circuits

CPE300: Digital System Architecture and Design

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Digital Design and Computer Architecture

BUSES IN COMPUTER ARCHITECTURE

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

CS61C : Machine Structures

6.3 Sequential Circuits (plus a few Combinational)

On the Rules of Low-Power Design

Lecture 10: Sequential Circuits

CS 250 VLSI System Design

CSE115: Digital Design Lecture 23: Latches & Flip-Flops

CS3350B Computer Architecture Winter 2015

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Microprocessor Design

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science SOLUTIONS

CS8803: Advanced Digital Design for Embedded Hardware

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

ELEN Electronique numérique

MODULE 3. Combinational & Sequential logic

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

CS61C : Machine Structures

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Logic Devices for Interfacing, The 8085 MPU Lecture 4

Sequential Circuit Design: Part 1

IT T35 Digital system desigm y - ii /s - iii

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Chapter 05: Basic Processing Units Control Unit Design Organization. Lesson 11: Multiple Bus Organisation

CHAPTER1: Digital Logic Circuits

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Sequential Logic. Introduction to Computer Yung-Yu Chuang

Why FPGAs? FPGA Overview. Why FPGAs?

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

CMOS VLSI Design. Lab 3: Datapath and Zipper Assembly

COMP2611: Computer Organization. Introduction to Digital Logic

Introduction to CMOS VLSI Design (E158) Lab 3: Datapath and Zipper Assembly

A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7

Sequential Circuit Design: Part 1

Computer Systems Architecture

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Lab #12: 4-Bit Arithmetic Logic Unit (ALU)

CMOS Latches and Flip-Flops

Chapter 4. Logic Design

CPS311 Lecture: Sequential Circuits

Asynchronous (Ripple) Counters

Sequential logic. Circuits with feedback. How to control feedback? Sequential circuits. Timing methodologies. Basic registers

COMP sequential logic 1 Jan. 25, 2016

WINTER 15 EXAMINATION Model Answer

Counters

EECS150 - Digital Design Lecture 19 - Finite State Machines Revisited

The outputs are formed by a combinational logic function of the inputs to the circuit or the values stored in the flip-flops (or both).

Digital Fundamentals. Lab 5 Latches & Flip-Flops CETT Name: Date:

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Combinational vs Sequential

MC9211 Computer Organization

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

Computer Architecture and Organization

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

(CSC-3501) Lecture 7 (07 Feb 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

Introduction to Computer Engineering. CS/ECE 252, Spring 2017 Rahul Nayar Computer Sciences Department University of Wisconsin Madison

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Logic Design Viva Question Bank Compiled By Channveer Patil

Flip-flop and Registers

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

EECS 270 Final Exam Spring 2012

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Transcription:

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture Feb 17, 2009 John Wawrzynek Spring 2009 EECS150 - Lec9-cpu Page 1 CMOS Devices Review: Transistor switch-level models The gate acts like a capacitor. A high voltage on the gate attracts charge into the channel. If a voltage exists between the source and drain a current will flow. In its simplest approximation, the device acts like a switch. nfet pfet Spring 2009 EECS150 - Lec9-cpu Page 2

Transistor-level Logic Circuits Simple rule for wiring up MOSFETs: nfet is used only to pass logic zero. pfet is used only to pass logic one. For example, consider the NAND gate: Note: This rule is sometimes violated by expert designers under special conditions. Spring 2009 EECS150 - Lec9-cpu Page 3 Transistor-level Logic Circuits NOR gate: Note: out = 0 iff a OR b =1 therefore out = (a+b) Again pfet network and nfet networks are duals of one another. Other more complex functions are possible. Ex: out = (a+bc) Spring 2009 EECS150 - Lec9-cpu Page 4

CMOS Logic Gates in General Pull-up network conducts under conditions to generate a logic 1 output Pull-down network conducts for logic 0 output Conductance must be mutually exclusive - else, short circuit! Pull-up and pull-down networks are topological duals Spring 2009 EECS150 - Lec9-cpu Page 5 Transmission Gate Transmission gates are the way to build switches in CMOS. In general, both transistor types are needed: nfet to pass zeros. pfet to pass ones. The transmission gate is bi-directional (unlike logic gates). Does not directly connect to Vdd and GND, but can be combined with logic gates or buffers to simplify many logic structures. Spring 2009 EECS150 - Lec9-cpu Page 6

Transmission-gate Multiplexor 2-to-multiplexor: C = sa + s b Switches simplify the implementation: a s b s c Compare the cost to logic gate implementation. Spring 2009 EECS150 - Lec9-cpu Page 7 Tri-state Buffers Tri-state Buffer: high impedance (output disconnected) Variations: Inverting buffer Inverted enable transmission gate useful in implementation Spring 2009 EECS150 - Lec9-cpu Page 8

Tri-state Buffers = 10 = 0 Tri-state buffers enable bidirectional connections. = 01 Tri-state buffers are used when multiple circuits all connect to a common wire. Only one circuit at a time is allowed to drive the bus. All others disconnect their outputs, but can listen. =1 = 0 Spring 2009 EECS150 - Lec9-cpu Page 9 = 0 Tri-state Based Multiplexor Multiplexor Transistor Circuit for inverting multiplexor: If s=1 then c=a else c=b Spring 2009 EECS150 - Lec9-cpu Page 10

Positive level-sensitive latch: Latches and Flip-flops Positive Edge-triggered flip-flop built from two level-sensitive latches: Latch Implementation: clk clk clk clk Spring 2009 EECS150 - Lec9-cpu Page 11 Processor Microarchitecture Introduction Microarchitecture: how to implement an architecture in hardware Good examples of how to put principles of digital design to practice. Introduction to final project. Spring 2009 EECS150 - Lec9-cpu Page 12

MIPS Processor Architecture For now we consider a subset of MIPS instructions: R-type instructions: and, or, add, sub, slt Memory instructions: lw, sw Branch instructions: beq Later we ll add addi and j Spring 2009 EECS150 - Lec9-cpu Page 13 MIPS Micrarchitecture Oganization Datapath + Controller + External Memory Controller Spring 2009 EECS150 - Lec9-cpu Page 14

How to Design a Processor: step-by-step 1. Analyze instruction set architecture (ISA) datapath requirements meaning of each instruction is given by the data transfers (register transfers) datapath must include storage element for ISA registers datapath must support each data transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the data transfer. 5. Assemble the control logic. Spring 2009 EECS150 - Lec9-cpu Page 15 Review: The MIPS Instruction R-type I-type J-type 31 31 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 21 16 op rs rt address/immediate 6 bits 5 bits 5 bits 16 bits 26 op target address 6 bits 26 bits 0 0 0 The different fields are: op: operation ( opcode ) of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of jump instruction Spring 2009 EECS150 - Lec9-cpu Page 16

add, sub, or, slt addu rd,rs,rt subu rd,rs,rt lw, sw lw rt,rs,imm16 sw rt,rs,imm16 beq beq rs,rt,imm16 Subset for Lecture 31 31 31 26 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 Spring 2009 EECS150 - Lec9-cpu Page 17 21 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 16 16 11 6 0 0 0 Register Transfer Descriptions All start with instruction fetch: {op, rs, rt, rd, shamt, funct} IMEM[ PC ] OR {op, rs, rt, Imm16} IMEM[ PC ] THEN inst Register Transfers add R[rd] R[rs] + R[rt]; PC PC + 4 sub R[rd] R[rs] R[rt]; PC PC + 4 or R[rd] R[rs] R[rt]; PC PC + 4 slt R[rd] (R[rs] < R[rt])? 1 : 0; PC PC + 4 lw R[rt] DMEM[ R[rs] + sign_ext(imm16)]; PC PC + 4 sw DMEM[ R[rs] + sign_ext(imm16) ] R[rt]; PC PC + 4 beq if ( R[rs] == R[rt] ) then PC PC + 4 + {sign_ext(imm16), 00} else PC PC + 4 Spring 2009 EECS150 - Lec9-cpu Page 18

Microarchitecture Multiple implementations for a single architecture: Single-cycle Each instruction executes in a single clock cycle. Multicycle Each instruction is broken up into a series of shorter steps with one step per clock cycle. Pipelined Each instruction is broken up into a series of steps with one step per clock cycle Multiple instructions execute at once. Spring 2009 EECS150 - Lec9-cpu Page 19 CPU clocking (1/2) Single Cycle CPU: All stages of an instruction are completed within one long clock cycle. The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle. 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Spring 2009 EECS150 - Lec9-cpu Page 20

CPU clocking (2/2) Multiple-cycle CPU: Only one stage of instruction per clock cycle. The clock is made as long as the slowest stage. 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped). Spring 2009 EECS150 - Lec9-cpu Page 21 MIPS State Elements Determines everything about the execution status of a processor: PC register 32 registers Memory Note: for these state elements, clock is used for write but not for read (asynchronous read, synchronous write). Spring 2009 EECS150 - Lec9-cpu Page 22

Single-Cycle Datapath: lw fetch First consider executing lw R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 1: Fetch instruction Spring 2009 EECS150 - Lec9-cpu Page 23 Single-Cycle Datapath: lw register read R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 2: Read source operands from register file Spring 2009 EECS150 - Lec9-cpu Page 24

Single-Cycle Datapath: lw immediate R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 3: Sign-extend the immediate Spring 2009 EECS150 - Lec9-cpu Page 25 Single-Cycle Datapath: lw address R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 4: Compute the memory address Spring 2009 EECS150 - Lec9-cpu Page 26

Single-Cycle Datapath: lw memory read R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 5: Read data from memory and write it back to register file Spring 2009 EECS150 - Lec9-cpu Page 27 Single-Cycle Datapath: lw PC increment STEP 6: Determine the address of the next instruction PC PC + 4 Spring 2009 EECS150 - Lec9-cpu Page 28

Single-Cycle Datapath: sw DMEM[ R[rs] + sign_ext(imm16) ] R[rt] Write data in rt to memory Spring 2009 EECS150 - Lec9-cpu Page 29 Single-Cycle Datapath: R-type instructions Read from rs and rt Write ALUResult to register file Write to rd (instead of rt) R[rd] R[rs] op R[rt] Spring 2009 EECS150 - Lec9-cpu Page 30

Single-Cycle Datapath: beq if ( R[rs] == R[rt] ) then PC PC + 4 + {sign_ext(imm16), 00} Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) Spring 2009 EECS150 - Lec9-cpu Page 31 Complete Single-Cycle Processor Spring 2009 EECS150 - Lec9-cpu Page 32

Control Unit Spring 2009 EECS150 - Lec9-cpu Page 33 Review: ALU F 2:0 Function 0 A & B 1 A B 10 A + B 11 not used 100 A & ~B 101 A ~B 110 A - B 111 SLT Spring 2009 EECS150 - Lec9-cpu Page 34

Control Unit: ALU Decoder ALUOp 1:0 Meaning 0 Add 1 Subtract 10 Look at Funct 11 Not Used ALUOp 1:0 Funct ALUControl 2:0 0 X 010 (Add) X1 X 110 (Subtract) 1X 100000 (add) 010 (Add) 1X 100010 (sub) 110 (Subtract) 1X 100100 (and) 000 (And) 1X 100101 (or) 001 (Or) Spring 2009 EECS150 - Lec9-cpu Page 35 1X 101010 (slt) 111 (SLT) Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 0 lw 1E+05 sw 1E+05 beq 100 Spring 2009 EECS150 - Lec9-cpu Page 36

Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 0 1 1 0 0 0 0 10 lw 1E+05 1 0 1 0 0 0 0 sw 1E+05 0 X 1 0 1 X 0 beq 100 0 X 0 1 0 X 1 Spring 2009 EECS150 - Lec9-cpu Page 37 Single-Cycle Datapath Example: or Spring 2009 EECS150 - Lec9-cpu Page 38

Extended Functionality: addi No change to datapath Spring 2009 EECS150 - Lec9-cpu Page 39 Control Unit: addi Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 0 1 1 0 0 0 0 10 lw 1E+05 1 0 1 0 0 1 0 sw 1E+05 0 X 1 0 1 X 0 beq 100 0 X 0 1 0 X 1 addi 1000 Spring 2009 EECS150 - Lec9-cpu Page 40

Control Unit: addi Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 0 1 1 0 0 0 0 10 lw 1E+05 1 0 1 0 0 1 0 sw 1E+05 0 X 1 0 1 X 0 beq 100 0 X 0 1 0 X 1 addi 1000 1 0 1 0 0 0 0 Spring 2009 EECS150 - Lec9-cpu Page 41 Extended Functionality: j Spring 2009 EECS150 - Lec9-cpu Page 42

Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump R-type 0 1 1 0 0 0 0 10 0 lw 1E+05 1 0 1 0 0 1 0 0 sw 1E+05 0 X 1 0 1 X 0 0 beq 100 0 X 0 1 0 X 1 0 j 100 Spring 2009 EECS150 - Lec9-cpu Page 43 Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump R-type 0 1 1 0 0 0 0 10 0 lw 1E+05 1 0 1 0 0 1 0 0 sw 1E+05 0 X 1 0 1 X 0 0 beq 100 0 X 0 1 0 X 1 0 j 100 0 X X X 0 X XX 1 Spring 2009 EECS150 - Lec9-cpu Page 44

Review: Processor Performance Program Execution Time = (# instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Spring 2009 EECS150 - Lec9-cpu Page 45 Single-Cycle Performance T C is limited by the critical path (lw) Spring 2009 EECS150 - Lec9-cpu Page 46

Single-Cycle Performance Single-cycle critical path: T c = t pcq_pc + t mem + max(t RFread, t sext + t mux ) + t ALU + t mem + t mux + t RFsetup In most implementations, limiting paths are: memory, ALU, register file. T c = t pcq_pc + 2t mem + t RFread + t mux + t ALU + t RFsetup Spring 2009 EECS150 - Lec9-cpu Page 47 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RFread 150 Register file setup t RFsetup 20 T c = Spring 2009 EECS150 - Lec9-cpu Page 48

Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RFread 150 Register file setup t RFsetup 20 T c = t pcq_pc + 2t mem + t RFread + t mux + t ALU + t RFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps Spring 2009 EECS150 - Lec9-cpu Page 49 Single-Cycle Performance Example For a program with 100 billion instructions executing on a singlecycle MIPS processor, Execution Time = Spring 2009 EECS150 - Lec9-cpu Page 50

Single-Cycle Performance Example For a program with 100 billion instructions executing on a singlecycle MIPS processor, Execution Time = # instructions x CPI x T C = (100 10 9 )(1)(925 10-12 s) = 92.5 seconds Spring 2009 EECS150 - Lec9-cpu Page 51 Pipelined MIPS Processor Temporal parallelism Divide single-cycle processor into 5 stages: Fetch Decode Execute Memory Writeback Add pipeline registers between stages Spring 2009 EECS150 - Lec9-cpu Page 52

Single-Cycle vs. Pipelined Performance Spring 2009 EECS150 - Lec9-cpu Page 53 Pipelining Abstraction Spring 2009 EECS150 - Lec9-cpu Page 54

Single-Cycle and Pipelined Datapath Spring 2009 EECS150 - Lec9-cpu Page 55 Corrected Pipelined Datapath WriteReg must arrive at the same time as Result Spring 2009 EECS150 - Lec9-cpu Page 56

Pipelined Control Same control unit as single-cycle processor Spring 2009 EECS150 - Lec9-cpu Page 57 Control delayed to proper pipeline stage Pipeline Hazards Occurs when an instruction depends on results from previous instruction that hasn t completed. Types of hazards: Data hazard: register value not written back to register file yet Control hazard: next instruction not decided yet (caused by branches) Spring 2009 EECS150 - Lec9-cpu Page 58