Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Similar documents
Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

Digital Design and Computer Architecture

ASIC = Application specific integrated circuit

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Lecture 10: Sequential Circuits

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Slide Set 7. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

Fundamentals of Computer Systems

Pipeline design. Mehran Rezaei

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

CMOS VLSI Design. Lab 3: Datapath and Zipper Assembly

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

11. Sequential Elements

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Digital Design and Computer Architecture

CS3350B Computer Architecture Winter 2015

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

COMP sequential logic 1 Jan. 25, 2016

COMP12111: Fundamentals of Computer Engineering

CSE115: Digital Design Lecture 23: Latches & Flip-Flops

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

Modeling Digital Systems with Verilog

More Digital Circuits

Combinational vs Sequential

Introduction to CMOS VLSI Design (E158) Lab 3: Datapath and Zipper Assembly

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

Sequential Logic. Introduction to Computer Yung-Yu Chuang

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

First Name Last Name November 10, 2009 CS-343 Exam 2

ECE 263 Digital Systems, Fall 2015

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits

Sequential Elements con t Synchronous Digital Systems

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

COMP2611: Computer Organization. Introduction to Digital Logic

Logic Design II (17.342) Spring Lecture Outline

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

CPS311 Lecture: Sequential Circuits

Microprocessor Design

Register Transfer Level (RTL) Design Cont.

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14

CS61C : Machine Structures

Computer Systems Architecture

BUSES IN COMPUTER ARCHITECTURE

Switching Circuits & Logic Design, Fall Final Examination (1/13/2012, 3:30pm~5:20pm)

Chapter 3 Unit Combinational

CS 61C: Great Ideas in Computer Architecture

1. Convert the decimal number to binary, octal, and hexadecimal.

CS61C : Machine Structures

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

6.3 Sequential Circuits (plus a few Combinational)

On the Rules of Low-Power Design

A Low Power Delay Buffer Using Gated Driver Tree

FPGA Design. Part I - Hardware Components. Thomas Lenzi

MODULE 3. Combinational & Sequential logic

WINTER 15 EXAMINATION Model Answer

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Last time, we saw how latches can be used as memory in a circuit

EECS 270 Midterm 1 Exam Closed book portion Winter 2017

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Lecture 11: Sequential Circuit Design

Counters

CS 261 Fall Mike Lam, Professor. Sequential Circuits

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

CPE300: Digital System Architecture and Design

Digital Integrated Circuits EECS 312

Logic Design II (17.342) Spring Lecture Outline

Logic Design ( Part 3) Sequential Logic- Finite State Machines (Chapter 3)

Contents Circuits... 1

ELEN Electronique numérique

Lecture 8: Sequential Logic

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Part 4: Introduction to Sequential Logic. Basic Sequential structure. Positive-edge-triggered D flip-flop. Flip-flops classified by inputs

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

Good Evening! Welcome!

Sequencing and Control

COE328 Course Outline. Fall 2007

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

ECSE-323 Digital System Design. Datapath/Controller Lecture #1

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55)

EECS 270 Midterm 2 Exam Closed book portion Fall 2014

BCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

Instruction Level Parallelism

ECE 545 Digital System Design with VHDL Lecture 2. Digital Logic Refresher Part B Sequential Logic Building Blocks

CprE 281: Digital Logic

Decade Counters Mod-5 counter: Decade Counter:

Logic Design Viva Question Bank Compiled By Channveer Patil

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

ECE 555 DESIGN PROJECT Introduction and Phase 1

(Refer Slide Time: 1:45)

problem maximum score 1 28pts 2 10pts 3 10pts 4 15pts 5 14pts 6 12pts 7 11pts total 100pts

Transcription:

Slide Set 6 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 2/97 Contents Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 3/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 4/97 Introduction to Chapter 7 of the textbook This chapter is called Microarchitecture. It s about how computer processors actually read and execute instructions. So it s about hardware: digital logic circuits. It s not about programming.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 5/97 Chapter 7 MIPS Instruction Subset Much of the chapter focuses on several different logic circuit designs capable of running programs that use a small subset of the MIPS instruction set. The instructions in the subset are: 5 R-type instructions: ADD, SUB, AND, OR, SLT. (They re called R-type because all of their operands are Registers.) 3 other instructions: LW, SW, BEQ. Extensions get made to some of the designs to support additional instructions, such as ADDI and J.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 6/97 An outline of the first part of Chapter 7 7.1: Introduction. Includes descriptions of some key elements of processor designs: Program Counter (PC), Instruction Memory, Register File, Data Memory. 7.2: Performance Analysis. Brief discussion of how to measure and report computer system performance. 7.3: Single-Cycle Processor. A MIPS-subset processor that handles one instruction per long clock cycle, finishing each instruction before starting the next one. 7.4: Multi-Cycle Processor. A MIPS-subset processor that takes several short clock cycles to handle each instruction, again finishing each instruction before starting the next one. This slide set is related to Sections 7.1 7.3.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 7/97 Take time to do assigned reading in Chapter 7 carefully! Many students may have been successful in ENCM 369 up to now without using the textbook very much. Things change with Chapters 7 and 8! It will be really hard to follow lecture, lab and tutorial material on processor designs without studying Chapter 7 of your textbook!

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 8/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 9/97 Components of synchronous, sequential logic systems The simple computer designs we ll study are synchronous, sequential logic systems. Review of two important words from ENEL 353: Sequential: A system in which the output depends not just on current input, but also past input. Synchronous: A system in which changes to the state bits occur all at the same time, in response to an active edge of a clock signal. Let s look at some important components of synchronous, sequential processor designs.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 10/97 Review of clock signals 1 0 T C t H t L rising edge falling edge rising edge falling edge time rising edge We ll use the same model for a clock signal that we used in ENEL 353. Things to note: A rising edge also called a positive edge is a low-to-high transition. A falling edge also called a negative edge is a high-to-low transition. The clock period is T C = t H + t L ; the frequency is 1/T C. t H is not necessarily equal to t L.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 11/97 About the clock signal model 1 0 T C t H t L rising edge falling edge rising edge falling edge rising edge A model is a simplified description of a component that helps you understand and predict behaviour of a system that uses that component. The clock model we use is fine for ENCM 369 but not good enough for integrated circuit designers! Real-world clock edges do not arrive at exactly the same time to all elements with clock inputs that is called clock skew, which we covered in ENEL 353 but will ignore in ENCM 369. Also, real-world clock signals are not perfectly periodic.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 12/97 An essential element: The D Flip-Flop Positive-edge-triggered D flip-flop: CLK Negative-edge-triggered D flip-flop: CLK D Q D Q The state Q copies the input D on each rising clock edge. (The bubble symbol The state Q copies the input D on each falling clock edge. indicates inversion of a signal.) All of the DFFs we saw in ENEL 353 in Fall 2017 were positive-edge-triggered. We ll see in Section 7.5 that sometimes it s useful to have state updates on negative clock edges.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 13/97 D flip-flop: Example behaviour and a useful definition For a few clock cycles with one positive-edge-triggered D flip-flop and one negative-edge-triggered D flip-flop let s see how the Q outputs responds to changes in D inputs... Definition: Active clock edge means rising clock edge, for positive-edge-triggered DFFs; falling clock edge, for negative-edge-triggered DFFs.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 14/97 D flip-flops: What s the point? This is important! If you re not clear about this point, you will not really understand any of the circuits in Chapter 7 of the textbook! A clock cycle is a span of time from one active edge of a clock to the next active edge. A D flip-flop captures the value of the input bit D at the end of a clock cycle, and makes that captured bit value available on Q throughout the next clock cycle.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 15/97 D flip-flops: Applications Typical use in ENEL 353: Two or three D flip-flops are used to hold the state of a synchronous finite state machine. Typical use in ENCM 369: A group of D flip-flops (usually but not always of them) are used to form a register, the state of which can only be updated on an active clock edge.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 16/97 CLK D 31 Q 31 D 30 Q 30... D 1 Q 1 A -bit register This gets updated once per clock cycle on positive clock edges. Each DFF receives the same CLK input. The diagram on the left shows the structure but is awkward to draw, so we ll use this compact symbol: CLK D 0 Q 0 D 31:0 Q 31:0

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 17/97 Should a given register do a state update at the end of every single clock cycle? Looking ahead a little... The answer is definitely yes for the PC register as used in Section 7.3, but no for the PC in Section 7.4, and not quite always for the PC in Section 7.5. Looking ahead some more... introduced in Section 7.5. yes for pipeline registers What about updates for the MIPS GPRs (general-purpose registers)? Assume that we are designing a machine to handle one instruction per clock cycle.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 18/97 Wires A wire connects an output bit of some element to the input bit(s) of one or more elements. To keep things simple, we ll model signalling over wires as happening without delay. But keep in mind that in real-world design of high-speed circuits, accounting for wire delays can be very important. Let s sketch some conventions for drawing wires and groups of wires. Note: To reduce clutter, the textbook often uses a solid thick line ( ) for a multi-wire bus, instead of the usual bus notation ( ).

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 19/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 20/97 The PC, Register File, and Memory Units PC (program counter): A special-purpose register used to hold an instruction address. Register File: Contains the MIPS GPRs (general-purpose registers). The is not at all like a file in the sense of files in folders in a file system! Memory Units: So far in ENCM 369, our model for a computer has a single memory array, holding both instructions and data. But the simplest possible design for our MIPS subset requires split memory: Instruction Memory, a container for machine code instructions Data Memory, written to by store instructions, and read from by load instructions

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 21/97 The PC CLK PC PC This is a simple -bit register. PC, the current value, is comprised of the Q outputs of DFFs. PC, the next value, is a -bit signal applied to the D inputs of those DFFs. Not shown: a reset input, to force the PC into a known state on system power-up an enable input, for systems in which PC updates happen on some but not all rising edges of CLK

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 22/97 Instruction Memory A RD Instruction Memory A is a -bit address input. RD is a -bit read data output. Most real computers allow users to modify the programs that the computer can run. That s not allowed by this simple ROM (read-only memory) circuit. To change the program in our simple computer, you would have to pull a ROM chip out of the instruction memory socket, and replace it with a different ROM chip containing different instructions.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 23/97 Register File: Inputs and Outputs (1) 5 5 5 A1 A2 A3 CLK WD3 WE3 Register File RD1 RD2 Note the CLK input. This a synchronous sequential element. State updates can happen only on rising edges of CLK. A1, A2, and A3 are address inputs. They could also be called register select inputs. Why are they each 5 bits wide?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 24/97 But the prof said, Registers don t have addresses! Let me be more precise: Registers do not have main memory addresses. You cannot make a C pointer point to a MIPS GPR. You cannot use MIPS GPRs for array elements, because the necessary address arithmetic won t work. However, a 5-bit address of a MIPS GPR really is an address, in the sense that it is a number that selects the GPR out of the set of GPRs.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 25/97 Register File: Inputs and Outputs (2) 5 5 5 A1 A2 A3 CLK WD3 WE3 Register File RD1 RD2 RD1 and RD2 are -bit outputs read data ports. WD3 is a -bit input a write data port. What is the relationship between the three data ports RD1, RD2 and WD3, the address inputs A1, A2, and A3, and the 1-bit write enable signal WE3?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 26/97 Data Memory WE A RD CLK Data Memory WD Again, note the CLK input. This, like the Register File, is a synchronous sequential element. State updates can happen only on rising edges of CLK. Let s make some notes about how this element behaves.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 27/97 A slighly fancier Data Memory Your instructor would prefer the Data Memory element to have two control inputs, EN (enable) and R/W (read/not-write)... EN A CLK SN s Data Memory WD R/W EN R/W action RD 0 X none 1 0 write to address A 1 1 read from address A That would have avoided waste of energy in handling instructions that are neither loads nor stores. But we ll follow the textbook the authors had the very reasonable goal of minimal clutter in schematics.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 28/97 Abbrevs To express ideas quickly and concisely, I will use the following abbreviations from time to time in slides and lecture notes... I-Mem: Instruction Memory R-File: Register File D-Mem: Data Memory

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 29/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 30/97 A model for Register File internals Section 5.5.5 of the textbook suggests that register files are usually built as SRAM (static RAM) arrays. An SRAM-based design is small and efficient, but its operation is hard to explain in the context of year 2 ENEL and ENSF curricula. (ENEL students: See ENCM 467 for SRAM details.) So I ll present a design in which registers are made of enabled D flip-flops.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 31/97 Pros and cons of the R-File model of this slide set Pro: It s built entirely from components presented thoroughly in ENEL 353: decoders, muxes, and DFFs with enable inputs. It accurately suggests that decoders are essential parts of R-File designs. Cons: As mentioned before, bits in real R-Files are likely held in SRAM cells, not DFFs. (A DFF is much larger than an SRAM cell and consumes much more energy.) The model uses -bit :1 bus multiplexers. Those muxes would work perfectly in theory, but in practice would tend to be unreasonably large and slow.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide /97 CLK D 31 Q 31 EN D 30 Q 30 EN... D 1 Q 1 EN -bit register with EN input Each DFF receives the same CLK input. On positive clock edges, all the DFFs copy D to Q if registeren = 1; all the DFFs keep their old Q values if registeren = 0. Compact symbol: CLK registeren D 0 Q 0 EN D 31:0 EN Q 31:0 registeren

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 33/97 The Register File has GPRs CLK D 31:0 Q 31:0 EN registeren How many of the above -bit enabled registers will we need? How many enabled D flip-flops is that? How can we build GPR 0 ($zero)?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 34/97 Register file building block: 5-to- decoder A 4 A 3 A 2 A 1 A 0 EN Y 31 Y 30. Y 2 Y 1 Y 0. If EN ( enable ) is 0, then all output bits are 0. If EN is 1, then one of the outputs is 1, as selected by the 5-bit number A 4 A 3 A 2 A 1 A 0, and the other 31 outputs are 0. This is just a larger version of the decoder-with-enable circuits (2-to-4, 3-to-8, 4-to-16) we saw several times in ENEL 353.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 35/97 5-to- decoder truth table EN bits A 4 A 0 bits Y 31 Y 0 0 XXXXX 0000 0000 0000 0000 0000 0000 0000 0000 1 00000 0000 0000 0000 0000 0000 0000 0000 0001 1 00001 0000 0000 0000 0000 0000 0000 0000 0010 1 00010 0000 0000 0000 0000 0000 0000 0000 0100... 1 01111 0000 0000 0000 0000 1000 0000 0000 0000 1 10000 0000 0000 0000 0001 0000 0000 0000 0000... 1 11110 0100 0000 0000 0000 0000 0000 0000 0000 1 11111 1000 0000 0000 0000 0000 0000 0000 0000 XXXXX in the first row means this: If EN is 0, it doesn t matter what A 4 A 0 are.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 36/97 Register file write logic, supported by a 5-to- decoder 5 5 5 A1 A2 A3 CLK WD3 WE3 Register File RD1 RD2 How should the decoder inputs be driven? Where will the bits of output from the decoder go? Those choices will result in the R-File write logic shown on the next slide...

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 37/97 CLK Y 31 EN GPR31 31:0 R-File WD3 input Y 30 Y 2 EN. EN GPR30 31:0. GPR02 31:0 Y 1 EN GPR01 31:0

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 38/97 Register file read logic (1) With the goal of saving some time, we re not going to go into detail here. One possible arrangement is to use two (large!) -bit :1 bus multiplexers. The 5-bit select inputs to the bus muxes would be the A1 and A2 R-File inputs. The first bus mux would use A1 to select one of -bit GPR values to copy to the RD1 output of the R-File, and the second bus mux would do the same thing with A2 and RD2. (As noted previously, a -bit :1 bus mux is an impractically large circuit element. A tutorial will present read register file logic that does not use muxes and is somewhat closer to what real-world register files use.)

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 39/97 Register file read logic (2) 5 5 5 Reminder: A1 A2 A3 CLK WD3 WE3 Register File RD1 RD2 The write logic is sequential a GPR update can only happen in response to an active clock edge. The read logic is combinational when A1 or A2 change, RD1 or RD2 will change without waiting for a clock edge.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 40/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 41/97 Textbook Section 7.2: Performance Analysis The textbook presents this equation for the execution time of a program: execution time = IC CPI T C IC is instruction count, the number of instructions executed in a program run. IC is not the size of a program! A single instruction can count many times in IC. For example, if a 10-instruction loop runs 450 times, the loop makes a 4500-instruction contribution to IC. T C is the processor clock period. We ll look at CPI in more detail on other slides...

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 42/97 More about IC: Instruction count For a program written in a language such as C, the factors affecting IC are: ISA (instruction set architecture); how good the compiler is at translating pieces of high-level language into efficient sequences of instructions; what the program input is. Microarchitecture can t do much about IC, but has a major impact on both CPI and T C...

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 43/97 CPI: Clock cycles per instruction (1) Low CPI is good for processor performance, and high CPI is bad. Microarchitectures arrangements of registers, memory systems, and arithmetic/logic circuits have a major influence on CPI. Textbook Chapter 7 looks at three kinds of microarchitecture: single-cycle, with a CPI of 1; multi-cycle, with a CPI of approximately 4; pipelined, with a CPI just a little greater than 1. But CPI is not a number determined exactly and entirely by microarchitecture...

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 44/97 CPI: Clock cycles per instruction (2) CPI is not determined entirely by microarchitecture. In fact, CPI is also program- and data-dependent: Certain programs, with certain inputs, result in execution of mostly easy, low-cpi instructions. Other programs, and/or other inputs, result in execution of a higher concentration of hard, high-cpi instructions. So CPI is a useful concept, but not a number you can precisely specify for any particular processor design.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 45/97 T C : Processor clock period Here s a review of the performance equation: execution time = IC CPI T C We ll see that there is a tradeoff between CPI and T C : design ideas that reduce CPI tend to increase T C ; design ideas that reduce T C tend to increase CPI. Another unwelcome consideration is that decreasing T C increases clock frequency (1/T C ), which increases average power consumption of a processor.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 46/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 47/97 Single-cycle processor: Overview Let s start with definitions of datapath and control. Datapath: A collection of circuit elements, connected in a way that will generate a result for some category of instructions. For example, the datapath for an LW instruction will include PC, I-Mem, R-File, D-Mem and a few other important elements. Control: A circuit element designed to send signals to datapath elements, to tell those elements what to do and sometimes when to do it. Example: A control circuit for our MIPS subset will need to turn on the WE input of D-Mem for SW, but turn it off for all other instructions.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 48/97 One clock cycle in the single-cycle machine This sketch shows how every instruction will work... CLK Datapath generates result(s) of current instruction. Result(s) ready. Update to PC, maybe to R-File or D-Mem, from previous instruction. Update to PC, maybe to R-File or D-Mem, from current instruction. The width of the Result(s) ready time interval will differ between instructions, but must always be greater than some kind of setup time for safe operation.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 49/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 50/97 Details of datapaths for the single-cycle machine The first datapath we ll look at is the datapath for LW. After that we ll move on to SW, R-type instructions, and BEQ. Before we start on LW, we ll need a few more datapath elements -bit adders, a 16-to--bit sign-extend unit, and a -bit ALU (arithmetic/logic unit).

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 51/97 -bit adder circuit for Chapter 7 Here is the symbol: Things to note: A 31:0 + B 31:0 The carry-in to the LSB is 0. Y 31:0 There is no carry-out-from-msb output. As explained in previous lectures, this circuit works for both unsigned and signed addition, without any sort of input to indicate which of signed or unsigned computation is desired.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 52/97 16-to--bit sign extend circuit input 16 Sign Extend output The above symbol can be (conceptually) implemented as shown below. (A practical circuit would probably include some buffers so that a single input wire would not have to drive 17 output wires.). output 31. output 16 input 15 output 15 input 14.. output 14. input 0 output 0

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 53/97 -bit ALU (arithmetic/logic unit) This combinational element has 67 input wires and 33 output wires: A 31:0 B 31:0 ALUControl 3 ALU Zero Y 31:0

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 54/97 A 31:0 B 31:0 ALUControl 3 ALU Zero Y 31:0 5 of the 8 possible ALUControl input bit patterns will matter for our MIPS-subset processor designs: ALUControl Y notes... 000 two A & B bitwise AND 001 two A B bitwise OR 010 two A + B addition 110 two A B subtraction 111 two A < B 0 for false, 1 for true What important aspect of the set-on-less-than comparison does the table NOT specify?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 55/97 A 31:0 B 31:0 ALUControl 3 ALU Zero Y 31:0 The 1-bit Zero signal has a confusing name! Sometimes the ALU will make Zero = 0 and sometimes the ALU will make Zero = 1. Let s write down the rules for how the Zero signal is computed. Looking ahead... For our MIPS subset, for which instruction will the Zero signal be useful?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 56/97 ALU examples A 31:0 B 31:0 ALUControl 3 ALU Zero Y 31:0 For each of the examples in the table, what will the outputs Y and Zero be? example A B ALUControl (1) 0x0000_0002 0x0000_0003 001 (2) 0x0000_0002 0x0000_0003 010 (3) 0xffff_ffff 0x0000_0000 111 (4) 0x0000_002a 0x0000_002a 110

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 57/97 Details of ALU design See textbook Section 5.2.4. ENCM 369 will not cover the details of ALU design down to the level of logic gates.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 58/97 Back to the datapath for LW... We have looked at all of the necessary datapath elements. Before we try to organize those elements, let s review the machine code formats for LW and SW... LW 31 1 0 0 0 1 1 26 25 21 20 16 15 0 pointer GPR dest. GPR offset SW 31 1 0 1 0 1 1 26 25 21 20 16 15 0 pointer GPR source GPR offset

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 59/97 LW datapath: Instruction fetch CLK Instr PC instruction address A RD Instruction Memory 31:26 5:0 25:21 20:16 20:16 15:11 15:0 instruction fields Instr is short for instruction, obviously. How many wires are there in total for instruction fields, and why is that number so much greater than?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 60/97 LW datapath: GPR read and address calculation CLK Instr 25:21 5 5 5 A1 A2 A3 WD3 WE3 Register File RD1 RD2 ALUControl 3 ALU Zero ALUResult Instr 15:0 16 Sign Extend Which signal is the data memory address? What bit pattern should be applied to ALUControl? Why use an ALU at all wouldn t it be easier to use an adder?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 61/97 LW datapath: D-Mem read and R-File update CLK CLK Instr 20:16 5 5 5 A1 A2 A3 WD3 WE3 Register File RD1 RD2 from ALU WE A RD Data Memory WD Note the role of instruction bits Instr 20:16! What are the correct values for the WE input to the D-Mem and the WE3 input to the R-File?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 62/97 LW datapath: PC update At the same time LW is doing its job of copying a word from Data Memory to the Register File, an update to the PC must be generated. What does the symbol 4 mean in this schematic? CLK PC PC to I-Mem 4 +

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 63/97 RTFT: Read The Fine Textbook Section 7.3.1 of the textbook explains the single-cycle datapaths for LW, SW, R-type and BEQ instructions in clear and careful detail, with schematics that are very difficult to squish into legible lecture slides. Please read this textbook material carefully! For the same reason, please be ready to carefully read other recommended sections of Chapter 7! Historical note: RTFM, or Read The F****** Manual, is advice that experienced programmers have been handing out for decades.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 64/97 Control Unit 31:26 Op 5:0 Funct MemtoReg MemWrite Branch ALUControl2:0 ALUSrc RegDst PCSrc RegWrite 0 1 PC' CLK PC 4 A RD Instruction Memory Instr PCPlus4 + 25:21 20:16 20:16 15:11 15:0 CLK A1 A2 A3 WD3 WE3 RD1 Register File RD2 WriteReg4:0 Sign Extend 0 1 SignImm 0 1 SrcA SrcB <<2 ALU Zero ALUResult WriteData PCBranch + CLK WE A RD Data Memory WD ReadData 0 1 Result Image is Figure 7.11 from Harris D. M. and Harris S. L., Digital Design and Computer Architecture, 2nd ed., c 2013, Elsevier, Inc.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 65/97 SW datapath: Instruction fetch CLK Instr PC instruction address A RD Instruction Memory 31:26 5:0 25:21 20:16 20:16 15:11 15:0 instruction fields This is exactly the same as instruction fetch for LW! In fact, instruction fetch is the same for all instructions how an instruction gets copied out of I-Mem does not depend on what kind of instruction it is!

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 66/97 SW datapath: GPR read and address calculation CLK Instr 25:21 Instr 20:16 5 5 5 A1 A2 A3 WD3 WE3 Register File RD1 RD2 ALUControl 3 ALU Zero ALUResult Instr 15:0 16 Sign Extend to D-Mem WD input The address calculation is exactly the same as in LW. But two registers must be read for SW: one to compute the address, and a second to supply the data to be stored. Signals involved in transferring that data are shown in red.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 67/97 SW datapath: Data memory and PC updates A schematic for the data memory update can be sketched quickly by hand, so let s do that, and write down a few notes. Is the PC update for SW different in any way from the PC update for LW?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 68/97 2:1 bus multiplexers Make sure you understand what these circuit elements do. We ll use them as key components in creating datapaths for R-type and BEQ instructions. -bit 2:1 bus mux: S 5-bit 2:1 bus mux: S A 31:0 B 31:0 0 1 F 31:0 F 31:0 = { A31:0 if S = 0 B 31:0 if S = 1 C 4:0 D 4:0 5 5 0 1 5 G 4:0 G 4:0 = { C4:0 if S = 0 D 4:0 if S = 1

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 69/97 R-type instructions The R-type instructions in our MIPS subset are ADD, SUB, AND, OR, and SLT. Why are they called R-type? The instruction format for R-type instructions is... 31 26 25 21 20 16 15 11 10 6 5 0 source source dest. 0 0 0 0 0 0 0 0 0 0 0 GPR 1 GPR 2 GPR funct field The funct field is the part of the instruction that identifies which of ADD, SUB, AND, OR, or SLT should be performed.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 70/97 A datapath for R-type instructions Our goal will be to build a datapath for R-type instructions that is compatible with the datapath already set up for LW and SW. To do that, we ll need to use multiplexers to solve problems like these... In LW and SW, one ALU input is a GPR value and the other is a sign-extended offset. What should the ALU inputs be for R-type instructions? In LW, the R-File A3 input is Instr 20:16. What should the R-File A3 input be for R-type instructions?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 71/97 GPR read and ALU use for LW, SW, and R-type Instr 25:21 CLK ALUSrc Instr 20:16 A1 WE3 RD1 5 A2 RD2 0 5 1 Register 5 A3 WD3 File ALUControl 3 Zero ALU ALUResult to D-Mem WD input Instr 15:0 16 Sign Extend What should ALUSrc and ALUControl be for LW? For SW? For R-type instructions?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 72/97 R-File update for LW and R-type instructions RegDst CLK MemtoReg Instr 20:16 Instr 15:11 5 5 0 1 5 5 5 A1 A2 A3 WD3 WE3 Register File RD1 RD2 ALUResult RD output from D-Mem 0 1 What should RegDst and MemtoReg be for LW? For R-type instructions? What about SW?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 73/97 PC update for R-type instructions Is the PC update for R-type instructions different in any way from the PC update for LW or SW? We have now covered seven instructions from our eight-instruction subset: LW, SW and five R-type instructions. The last instruction to cover is BEQ. As you may have suspected, handling BEQ adds complexity to the PC update logic!

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 74/97 BEQ instruction format and behaviour Instruction format: 31 0 0 0 1 0 0 Behaviour: 26 25 21 20 16 15 0 source GPR 1 source GPR 2 offset if source GPRs are equal PC = (PC + 4) + 4 sign-extended offset else PC = PC + 4 We already have a datapath to compute PC + 4. We ll need to add features to get (PC + 4) + 4 sign-extended offset.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 75/97 Datapath for BEQ instruction: design ideas We can multiply by 4 by doing a shift-left-2 (<< 2) of the Sign Extend output. Q1: How can we use the ALU to decide whether or not a branch should be taken? Q2: What does that say about the values of ALUSrc and ALUControl for a BEQ instruction? Q3: If a signal called Branch is 1 for BEQ, but 0 for LW, SW and R-type instructions, how can we use that signal to ensure correct PC updates for BEQ and all the other instructions? These ideas lead to the schematic on the next slide...

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 76/97 Datapath for BEQ instruction: schematic CLK CLK Branch ALUControl ALUSrc 3 0 1 PC PC 4 + Instr 25:21 Instr 20:16 5 5 5 A1 A2 A3 WD3 WE3 RD1 RD2 Register File 0 1 ALU Zero << 2 Instr 15:0 Sign Extend 16 +

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 77/97 Datapath for BEQ instruction: mux for input to PC To understand how BEQ is handled it may help to zoom in on a small but critical part of the schematic from the previous slide:??? 0 1 PC CLK PC For each of the mux inputs, let s write a brief but precise description.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 78/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 79/97 Control for the single-cycle machine A review of definitions for datapath and control... Datapath: A collection of circuit elements, connected in a way that will generate a result for some category of instructions. For example, the datapath for an LW instruction will include PC, I-Mem, R-File, D-Mem and a few other important elements. Control: A circuit element designed to send signals to datapath elements, to tell those elements what to do and sometimes when to do it.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 80/97 Single-cycle control: Inputs and outputs (1) The control unit for our single-cycle processor will be combinational logic. The inputs to the control unit describe what kind of instruction is being executed. Which bits from the current instruction must be supplied as inputs to the control unit? The outputs of the control unit will be six 1-bit signals MemtoReg, MemWrite, Branch, ALUSrc, RegDst, and RegWrite and one 3-bit signal ALUControl. Let s make some notes about all of the control unit outputs.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 81/97 Single-cycle control: Inputs and outputs (2) Instr 31:26 opcode ALUControl Instr 5:0 Control Unit funct MemtoReg MemWrite Branch ALUSrc RegDst RegWrite 3 If we implemented this as a ROM circuit, what would be the dimensions of the ROM array? (See textbook Section 5.5.6 to review ROMs.)

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 82/97 Single-cycle control: Split into two parts Instr 31:26 Instr 5:0 Main Decoder 2 ALUOp ALU Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUControl 3 Let s write some rules for the 2-bit ALUOp signal. What are the dimensions for each part, if each of the two parts is a ROM?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 83/97 SLT $t1, $t0, $zero machine code is... 000000 01000 00000 01001 00000 101010 Instr 31:26 Instr 5:0 Main Decoder 2 ALUOp ALU Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUControl 3 For this example SLT instruction, what does the Main Decoder do? What does the ALU Decoder do?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 84/97 LW $s1, 0x1234($s0) machine code is... 100011 10000 10001 0001 0010 0011 0100 Instr 31:26 Instr 5:0 Main Decoder 2 ALUOp ALU Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUControl 3 For this example LW instruction, what does the Main Decoder do? What does the ALU Decoder do?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 85/97 BEQ $t9, $zero, [6 instructions back] machine code is... 000100 11001 00000 1111 1111 1111 1010 Instr 31:26 Instr 5:0 Main Decoder 2 ALUOp ALU Decoder MemtoReg MemWrite Branch ALUSrc RegDst RegWrite ALUControl 3 For this example BEQ instruction, what does the Main Decoder do? What does the ALU Decoder do?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 86/97 Complete specification for Main Decoder within the Control Unit of the Figure 7.11 computer Instruction RegWrite RegDst ALUSrc Branch MemWrite MemtoReg ALUOp R-type 1 1 0 0 0 0 10 LW 1 0 1 0 0 1 00 SW 0 X 1 0 1 X 00 BEQ 0 X 0 1 0 X 01 Exercise: Make a blank version of this table, then fill it in by looking at Figure 7.11 and deciding what all the signal values should be.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 87/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

Sketch of timing for LW $s1, 0x1234($s0) slide 88/97 CLK PC output Instruction main decoder outputs R-File outputs ALU decoder outputs ALU result D-Mem RD output $s1 contents

CLK PC output Instruction main decoder outputs R-File outputs ALU decoder outputs ALU result D-Mem RD output $s1 contents 1 2 3 4 5 6 7

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 90/97 What happens as we adjust the clock period? Which clock speeds work for LW, and which ones do not? fast clock 1 2 3 4 5 6 slow clock 1 2 3 4 5 6 medium clock 1 2 3 4 5 6

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 91/97 Detailed timing analysis for the single-cycle machine To study this material, it may be useful to review textbook Sections 2.9 and 3.5 on timing. We re going to follow the notation and presentation of textbook Section 7.3.4, and make the following assumptions: reading the R-File (t RFread ) takes longer than sign-extend and a mux combined (as stated in the textbook); reading the R-File (t RFread ) takes longer than generating Control Unit outputs (assumed but not actually stated). We ll look at the critical path for an LW instruction.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 92/97 After a rising clock edge, there is a delay of up to t pcq PC (PC clock-to-q propagation delay) until the PC output is ready. Once the PC is ready, the critical path for LW will run through 5 units: I-Mem, R-File, ALU, D-Mem, and the mux controlled by MemtoReg. The R-File and D-Mem act like combinational logic when they re read, so the overall propagation delay through the 5 units is just the sum of 5 individual delays: t mem + t RFread + t ALU + t mem + t mux It s assumed that I-Mem and D-Mem have the same delay, t mem, so the overall combinational delay simplifies to 2t mem + t RFread + t ALU + t mux

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 93/97 It s assumed that R-File updates will work correctly if its WD3 (write data) input is ready no later than t RFsetup (R-File setup time) in advance of a rising clock edge. So for safe operation of an LW instruction: T C t RFsetup t pcq PC + 2t mem + t RFread + t ALU + t mux T C t pcq PC + 2t mem + t RFread + t ALU + t mux + t RFsetup If that isn t clear, please study Section 7.3.4 carefully. There will be lab exercise or two to help get you comfortable with this kind of timing analysis.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 94/97 Outline of Slide Set 6 Introduction to Chapter 7 of the textbook Components of synchronous, sequential logic systems The PC, Register File, and Memory Units A model for Register File internals Textbook Section 7.2: Performance Analysis Single-cycle processor: Overview Details of datapaths for the single-cycle machine Control for the single-cycle machine Single-cycle timing example: LW instruction More instructions, and next steps

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 95/97 More instructions, and next steps Textbook Section 7.3.3 looks at adding support for ADDI and J instructions to the single-cycle design. We won t spend lecture time on that, but we ll look at supporting ADDI, J, and perhaps some other instructions, in lab exercises.

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 96/97 Moving on What determines the minimum safe clock period in the single-cycle machine? We ve just seen that, roughly speaking, it s the SUM of the response times of several datapath elements. Idea: Could there be a different design that does not require such a long cascade of events in a single clock cycle? Could that allow a much shorter clock period?

ENCM 369 Winter 2018 Section 01 Slide Set 6 slide 97/97 Final comments on the single-cycle processor Would it work if you built it? Yes! The only missing detail from Section 7.3 of the textbook is logic to properly initialize the PC when the system is powered up. The PC must start with a specific instruction address, not some random bit pattern in its flip-flops. It s quite cool to realize that between last September, at the start of ENEL 353, and now, you ve learned enough about digital logic, assembly language and machine code to truly understand a simple but real computer design! However, the simplicity of the single-cycle design has some disadvantages, so in the next slide set we re going to look at a more complex design.