Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 24 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide /9 CM 69 W4 Section Slide Set 6 slide 4/9 Textbook Section 72: Performance Analysis This chapter is called Microarchitecture It s about how computer processors actually read and execute instructions So it s about hardware: digital logic circuits It s not about programming CM 69 W4 Section Slide Set 6 slide /9 Chapter 7 MIPS Instruction Subset Much of the chapter focuses on several different logic circuit designs capable of running programs that use a small subset of the MIPS instruction set The instructions in the subset are: R-type instructions: ADD, SUB, AND, OR, SLT (They re called R-type because all of their operands are s) other instructions: LW, SW, BEQ Extensions get made to some of the designs to support additional instructions, such as ADDI and J CM 69 W4 Section Slide Set 6 slide 6/9 An outline of the first part of Chapter 7 7: Introduction Includes descriptions of some key elements of processor designs: Program Counter (PC), Instruction Memory,, Data Memory 72: Performance Analysis Brief discussion of how to measure and report computer system performance 7: Single-Cycle Processor A MIPS-subset processor that handles one instruction per long clock cycle, finishing each instruction before starting the next one 74: Multi-Cycle Processor A MIPS-subset processor that takes several short clock cycle cycles to handle each instruction, again finishing each instruction before starting the next one This slide set is related to Sections 7 7

CM 69 W4 Section Slide Set 6 slide 7/9 Take time to do assigned reading in Chapter 7 carefully! Many students may have been successful in CM 69 up to now without using the textbook very much Things change with Chapters 7 and 8! It will be really hard to follow lecture, lab and tutorial material on processor designs without studying Chapter 7 of your textbook! CM 69 W4 Section Slide Set 6 slide 8/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 9/9 Components of synchronous, sequential logic systems The simple computer designs we ll study are synchronous, sequential logic systems Review of two important words from EL : Sequential: A system in which the output depends not just on current input, but also past input Synchronous: A system in which changes to the state bits occur all at the same time, in response to an active of a clock signal Let s look at some important components of synchronous, sequential processor designs CM 69 W4 Section Slide Set 6 slide /9 About the clock signal model rising falling T C t H t L rising falling rising A model is a simplified description of a component that helps you understand and predict behaviour of a system that uses that component The clock model we use is fine for CM 69 but not good enough for integrated circuit designers! Real-world clock s do not arrive at exactly the same time to all elements with clock inputs that is called clock skew, which we covered in EL but will ignore in CM 69 Also, real-world clock signals are not perfectly periodic CM 69 W4 Section Slide Set 6 slide /9 Review of clock signals rising falling T C t H t L rising falling time rising We ll use the same model for a clock signal that we used in EL Things to note: A rising also called a positive is a low-to-high transition A falling also called a negative is a high-to-low transition The clock period is T C = t H + t L ; the frequency is /T C t H is not necessarily equal to t L CM 69 W4 Section Slide Set 6 slide 2/9 An essential element: The D Flip-Flop Positive--triggered D flip-flop: D Q The state Q copies the input D on each rising clock (The bubble symbol Negative--triggered D flip-flop: D Q The state Q copies the input D on each falling clock indicates inversion of a signal) All of the DFFs we saw in EL in Fall 2 were positive--triggered We ll see in Section 7 that sometimes it s useful to have state updates on negative clock s

CM 69 W4 Section Slide Set 6 slide /9 D flip-flop: Example behaviour and a useful definition For a few clock cycles with one positive--triggered D flip-flop and one negative--triggered D flip-flop let s see how the Q outputs responds to changes in D inputs Definition: Active clock means rising clock, for positive--triggered DFFs; falling clock, for negative--triggered DFFs CM 69 W4 Section Slide Set 6 slide 4/9 D flip-flops: What s the point? This is important! If you re not clear about this point, you will not really understand any of the circuits in Chapter 7 of the textbook! A clock cycle is a span of time from one active of a clock to the next active A D flip-flop captures the value of the input bit D at the end of a clock cycle, and makes that captured bit value available on Q throughout the next clock cycle CM 69 W4 Section Slide Set 6 slide /9 D flip-flops: Applications Typical use in EL : Two or three D flip-flops are used to hold the state of a synchronous finite state machine Typical use in CM 69: A group of D flip-flops (usually but not always of them) are used to form a register, the state of which can only be updated on an active clock CM 69 W4 Section Slide Set 6 slide 6/9 D Q D Q D Q A -bit register This gets updated once per clock cycle on positive clock s Each DFF receives the same input The diagram on the left shows the structure but is awkward to draw, so we ll use this compact symbol: D Q D : Q : CM 69 W4 Section Slide Set 6 slide 7/9 Do we want registers to do an update at the end of every single clock cycle? Looking ahead a little The answer is definitely yes for the PC register as used in Section 7, but no for the PC in Section 74, and not quite always for the PC in Section 7 Looking ahead some more introduced in Section 7 yes for pipeline registers What about updates for the MIPS GPRs (general-purpose registers)? Assume that we are designing a machine to handle one instruction per clock cycle CM 69 W4 Section Slide Set 6 slide 8/9 Wires A wire connects an output bit of some element to the input bit(s) of one or more elements To keep things simple, we ll model signalling over wires as happening without delay But keep in mind that in real-world design of high-speed circuits, accounting for wire delays can be very important Let s sketch some conventions for drawing wires and groups of wires Note: To reduce clutter, the textbook often uses a solid thick line ( ) for a multi-wire bus, instead of the usual bus notation ( )

CM 69 W4 Section Slide Set 6 slide 9/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 2/9 PC (program counter): A special-purpose register used to an instruction address : Contains the MIPS GPRs (general-purpose registers) The is not at all like a file in the sense of files in folders in a file system! Memory Units: So far in CM 69, our model for a computer has a single memory array, holding both instructions and data But the simplest possible design for our MIPS subset requires split memory: Instruction Memory, a container for machine code instructions Data Memory, written to by store instructions, and read from by load instructions CM 69 W4 Section Slide Set 6 slide 2/9 The PC PC This is a simple -bit register PC, the current value, is comprised of the Q outputs of DFFs PC, the next value, is a -bit signal applied to the D inputs of those DFFs Not shown: PC a reset input, to force the PC into a known state on system power-up an enable input, for systems in which PC updates happen on some but not all rising s of CM 69 W4 Section Slide Set 6 slide 2/9 : Inputs and Outputs () A A WD WE RD Note the input This a synchronous sequential element State updates can happen only on rising s of A,, and A are address inputs They could also be called register select inputs Why are they each bits wide? CM 69 W4 Section Slide Set 6 slide 22/9 Instruction Memory A RD Instruction Memory A is a -bit address input RD is a -bit read data output Most real computers allow users to modify the programs that the computer can run That s not allowed by this simple ROM (read-only memory) circuit To change the program in our simple computer, you would have to pull a ROM chip out of the instruction memory socket, and replace it with a different ROM chip containing different instructions CM 69 W4 Section Slide Set 6 slide 24/9 But the prof said, s don t have addresses! Let me be more precise: s do not have main memory addresses You cannot make a C pointer point to a MIPS GPR You cannot use MIPS GPRs for array elements, because the necessary address arithmetic won t work However, a -bit address of a MIPS GPR really is an address, in the sense that it is a number that selects the GPR out of the set of GPRs

CM 69 W4 Section Slide Set 6 slide 2/9 : Inputs and Outputs (2) A A WD WE RD RD and are -bit outputs read data ports WD is a -bit input a write data port What is the relationship between the three data ports RD, and WD, the address inputs A,, and A, and the -bit write enable signal WE? CM 69 W4 Section Slide Set 6 slide 26/9 Data Memory WE A RD Data Memory WD Again, note the input This, like the, is a synchronous sequential element State updates can happen only on rising s of Let s make some notes about how this element behaves CM 69 W4 Section Slide Set 6 slide 27/9 A slighly fancier Data Memory Your instructor would prefer the Data Memory element to have two control inputs, (enable) and R/W (read/not-write) A SN s Data Memory WD R/W R/W action RD X none write to address A read from address A CM 69 W4 Section Slide Set 6 slide 28/9 Abbrevs To express ideas quickly and concisely, I will use the following abbreviations from time to time in slides and lecture notes I-Mem: Instruction Memory R-: D-Mem: Data Memory That would have avoided waste of energy in handling instructions that are neither loads nor stores But we ll follow the textbook the authors had the very reasonable goal of minimal clutter in schematics CM 69 W4 Section Slide Set 6 slide 29/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide /9 Section of the textbook suggests that register files are usually built as SRAM (static RAM) arrays An SRAM-based design is small and efficient, but its operation is hard to explain in the context of year 2 EL and SF curriculum (EL students: See CM 467 for SRAM details) So I ll present a design in which registers are made of enabled D flip-flops

CM 69 W4 Section Slide Set 6 slide /9 register D Q D Q D Q D Q -bit register with input Each DFF receives the same input On positive clock s, all the DFFs copy D to Q if register = ; all the DFFs keep their old Q values if register = Compact symbol: D : Q : register CM 69 W4 Section Slide Set 6 slide /9 The has GPRs D : Q : register How many of the above -bit enabled registers will we need? How many enabled D flip-flops is that? How can we build GPR ($zero)? CM 69 W4 Section Slide Set 6 slide /9 file building block: -to- decoder CM 69 W4 Section Slide Set 6 slide 4/9 -to- decoder truth table A 4 A A 2 A A Y Y Y 2 Y Y If ( enable ) is, then all output bits are If is, then one of the outputs is, as selected by the -bit number A 4 A A 2 A A, and the other outputs are This is just a larger version of the decoder-with-enable circuits (2-to-4, -to-8, 4-to-6) we saw several times in EL bits A 4 A bits Y Y XXXXX XXXXX in the first row means this: If is, it doesn t matter what A 4 A are CM 69 W4 Section Slide Set 6 slide /9 file write logic, supported by a -to- decoder CM 69 W4 Section Slide Set 6 slide 6/9 Y GPR : A A WD WE RD How should the decoder inputs be driven? Where will the bits of output from the decoder go? Those choices will result in the R- write logic shown on the next slide R- WD input Y Y 2 Y GPR : GPR2 : GPR :

CM 69 W4 Section Slide Set 6 slide 7/9 file read logic () With the goal of saving some time, we re not going to go into detail here One possible arrangement is to use two (large!) -bit : bus multiplexers The -bit select inputs to the bus muxes would be the A and R- inputs The first bus mux would use A to select one of -bit GPR values to copy to the RD output of the R-, and the second bus mux would do the same thing with and CM 69 W4 Section Slide Set 6 slide 9/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 8/9 file read logic (2) Reminder: A A WD WE RD The write logic is sequential a GPR update can only happen in response to an active clock The read logic is combinational when A or change, RD or will change without waiting for a clock CM 69 W4 Section Slide Set 6 slide 4/9 Textbook Section 72: Performance Analysis The textbook presents this equation for the execution time of a program: execution time = IC CPI T C IC is instruction count, the number of instructions executed in a program run IC is not the size of a program! A single instruction can count many times in IC For example, if a -instruction loop runs 4 times, the loop makes a 4-instruction contribution to IC T C is the processor clock period We ll look at CPI in more detail on other slides CM 69 W4 Section Slide Set 6 slide 4/9 More about IC: Instruction count For a program written in a language such as C, the factors affecting IC are: ISA (instruction set architecture); how good the compiler is at translating pieces of high-level language into efficient sequences of instructions; what the program input is Microarchitecture can t do much about IC, but has a major impact on both CPI and T C CM 69 W4 Section Slide Set 6 slide 42/9 CPI: Clock cycles per instruction () Low CPI is good for processor performance, and high CPI is bad Microarchitectures arrangements of registers, memory systems, and arithmetic/logic circuits have a major influence on CPI Textbook Chapter 7 looks at three kinds of microarchitecture: single-cycle, with a CPI of ; multi-cycle, with a CPI of approximately 4; pipelined, with a CPI just a little greater than But CPI is not a number determined exactly and entirely by microarchitecture

CM 69 W4 Section Slide Set 6 slide 4/9 CPI: Clock cycles per instruction (2) CM 69 W4 Section Slide Set 6 slide 44/9 T C : Processor clock period CPI is not determined entirely by microarchitecture In fact, CPI is also program- and data-dependent: Certain programs, with certain inputs, result in execution of mostly easy, low-cpi instructions Other programs, and/or other inputs, result in execution of a higher concentration of hard, high-cpi instructions So CPI is a useful concept, but not a number you can precisely specify for any particular processor design Here s a review of the performance equation: execution time = IC CPI T C We ll see that there is a tradeoff between CPI and T C : design ideas that reduce CPI tend to increase T C ; design ideas that reduce T C tend to increase CPI Another unwelcome consideration is that decreasing T C increases clock frequency (/T C ), which increases average power consumption of a processor CM 69 W4 Section Slide Set 6 slide 4/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 46/9 Let s start with definitions of datapath and control Datapath: A collection of circuit elements, connected in a way that will generate a result for some category of instructions For example, the datapath for an LW instruction will include PC, I-Mem, R-, D-Mem and a few other important elements Control: A circuit element designed to send signals to datapath elements, to tell those elements what to do and sometimes when to do it Example: A control circuit for our MIPS subset will need to turn on the WE input of D-Mem for SW, but turn it off for all other instructions CM 69 W4 Section Slide Set 6 slide 47/9 One clock cycle in the single-cycle machine This sketch shows how every instruction will work Datapath generates result(s) of current instruction Update to PC, maybe to R- or D-Mem, from previous instruction Result(s) ready Update to PC, maybe to R- or D-Mem, from current instruction The width of the Result(s) ready time interval will differ between instructions, but must always be greater than zero for safe operation CM 69 W4 Section Slide Set 6 slide 48/9 Textbook Section 72: Performance Analysis

CM 69 W4 Section Slide Set 6 slide 49/9 CM 69 W4 Section Slide Set 6 slide /9 -bit adder circuit for Chapter 7 Here is the symbol: The first datapath we ll look at is the datapath for LW After that we ll move on to SW, R-type instructions, and BEQ Before we start on LW, we ll need a few more datapath elements -bit adders, a 6-to--bit sign-extend unit, and a -bit (arithmetic/logic unit) A : Things to note: B : + The carry-in to the LSB is Y : There is no carry-out-from-msb output As explained in previous lectures, this circuit works for both unsigned and signed addition, without any sort of input to indicate which of signed or unsigned computation is desired CM 69 W4 Section Slide Set 6 slide /9 6-to--bit sign extend circuit input 6 Sign Extend output The above symbol can be (conceptually) implemented as shown below (A practical circuit would probably include some buffers so that a single input wire would not have to drive 7 output wires) output output 6 input output input 4 output 4 input output CM 69 W4 Section Slide Set 6 slide 2/9 -bit (arithmetic/logic unit) This combinational element has 67 input wires and output wires: A : B : Control Y : CM 69 W4 Section Slide Set 6 slide /9 A : B : Control Y : of the 8 possible Control input bit patterns will matter for our MIPS-subset processor designs: Control Y notes two A & B bitwise AND two A B bitwise OR two A + B addition two A B subtraction two A < B for false, for true What important aspect of the set-on-less-than comparison does the table NOT specify? CM 69 W4 Section Slide Set 6 slide 4/9 A : B : Control Y : The -bit signal has a confusing name! Sometimes the will make = and sometimes the will make = Let s write down the rules for how the signal is computed Looking ahead For our MIPS subset, for which instruction will the signal be useful?

CM 69 W4 Section Slide Set 6 slide /9 examples A : B : Control Y : For each of the examples in the table, what will the outputs Y and be? example A B Control () x_2 x_ (2) x_2 x_ () xffff_ffff x_ (4) x_2a x_2a CM 69 W4 Section Slide Set 6 slide 6/9 Details of design See textbook Section 24 CM 69 will not cover the details of design down to the level of logic gates CM 69 W4 Section Slide Set 6 slide 7/9 Back to the datapath for LW CM 69 W4 Section Slide Set 6 slide 8/9 LW datapath: Instruction fetch We have looked at all of the necessary datapath elements Before we try to organize those elements, let s review the machine code formats for LW and SW LW 26 2 2 2 6 pointer GPR dest GPR offset Instr PC instruction address A RD Instruction Memory :26 : 2:2 2:6 2:6 : instruction fields SW 26 2 2 2 6 pointer GPR source GPR offset Instr is short for instruction, obviously : How many wires are there in total for instruction fields, and why is that number so much greater than? CM 69 W4 Section Slide Set 6 slide 9/9 LW datapath: GPR read and address calculation Instr 2:2 Instr : 6 A A WD WE Sign Extend RD Control Result Which signal is the data memory address? What bit pattern should be applied to Control? Why use an at all wouldn t it be easier to use an adder? CM 69 W4 Section Slide Set 6 slide 6/9 LW datapath: D-Mem read and R- update Instr 2:6 A A WD WE RD from Note the role of instruction bits Instr 2:6! WE A RD Data Memory WD What are the correct values for the WE input to the D-Mem and the WE input to the R-?

CM 69 W4 Section Slide Set 6 slide 6/9 LW datapath: PC update At the same time LW is doing its job of copying a word from Data Memory to the, an update to the PC must be generated What does the symbol 4 mean in this schematic? PC PC to I-Mem CM 69 W4 Section Slide Set 6 slide 62/9 RTFT: Read The Fine Textbook Section 7 of the textbook explains the single-cycle datapaths for LW, SW, R-type and BEQ instructions in clear and careful detail, with schematics that are very difficult to squish into legible lecture slides Please read this textbook material carefully! For the same reason, please be ready to carefully read other recommended sections of Chapter 7! 4 + Historical note: RTFM, or Read The F****** Manual, is advice that experienced programmers have been handing out for decades CM 69 W4 Section Slide Set 6 slide 6/9 PC' PC 4 A RD Instruction Memory + PCPlus4 Instr :26 : 2:2 2:6 2:6 : : Control Unit Op Funct A A WD MemtoReg MemWrite Branch Control2: Src RegDst RegWrite WE RD WriteReg4: Sign Extend SignImm SrcA SrcB <<2 + Result WriteData PCBranch PCSrc A RD Data Memory WD WE ReadData Image is Figure 7 from Harris D M and Harris S L, Digital Design and Computer Architecture, 2nd ed, c 2, Elsevier, Inc Result CM 69 W4 Section Slide Set 6 slide 64/9 SW datapath: Instruction fetch Instr PC instruction address A RD Instruction Memory :26 : 2:2 2:6 2:6 : : instruction fields This is exactly the same as instruction fetch for LW! In fact, instruction fetch is the same for all instructions how an instruction gets copied out of I-Mem does not depend on what kind of instruction it is! CM 69 W4 Section Slide Set 6 slide 6/9 SW datapath: GPR read and address calculation CM 69 W4 Section Slide Set 6 slide 66/9 SW datapath: Data memory and PC updates Instr 2:2 Instr 2:6 Instr : 6 A A WD WE Sign Extend RD Control Result to D-Mem WD input A schematic for the data memory update can be sketched quickly by hand, so let s do that, and write down a few notes Is the PC update for SW different in any way from the PC update for LW? The address calculation is exactly the same as in LW But two registers must be read for SW: one to compute the address, and a second to supply the data to be stored Signals involved in transferring that data are shown in red

CM 69 W4 Section Slide Set 6 slide 67/9 2: bus multiplexers Make sure you understand what these circuit elements do We ll use them as key components in creating datapaths for R-type and BEQ instructions -bit 2: bus mux: A : B : F : = S F : { A: if S = B : if S = -bit 2: bus mux: C 4: D 4: G 4: = S G 4: { C4: if S = D 4: if S = CM 69 W4 Section Slide Set 6 slide 68/9 R-type instructions The R-type instructions in our MIPS subset are ADD, SUB, AND, OR, and SLT Why are they called R-type? The instruction format for R-type instructions is 26 2 2 2 6 6 source source dest GPR GPR 2 GPR funct field The funct field is the part of the instruction that identifies which of ADD, SUB, AND, OR, or SLT should be performed CM 69 W4 Section Slide Set 6 slide 69/9 A datapath for R-type instructions Our goal will be to build a datapath for R-type instructions that is compatible with the datapath already set up for LW and SW To do that, we ll need to use multiplexers to solve problems like these In LW and SW, one input is a GPR value and the other is a sign-extended offset What should the inputs be for R-type instructions? In LW, the R- A input is Instr 2:6 What should the R- A input be for R-type instructions? CM 69 W4 Section Slide Set 6 slide 7/9 GPR read and use for LW, SW, and R-type Instr 2:2 A WE RD Instr 2:6 A WD Instr : 6 Sign Extend Src Control Result to D-Mem WD input What should Src and Control be for LW? For SW? For R-type instructions? CM 69 W4 Section Slide Set 6 slide 7/9 R- update for LW and R-type instructions CM 69 W4 Section Slide Set 6 slide 72/9 PC update for R-type instructions Instr 2:6 Instr : RegDst A A WD WE RD Result RD output from D-Mem MemtoReg What should RegDst and MemtoReg be for LW? For R-type instructions? What about SW? Is the PC update for R-type instructions different in any way from the PC update for LW or SW? We have now covered seven instructions from our eight-instruction subset: LW, SW and five R-type instructions The last instruction to cover is BEQ As you may have suspected, handling BEQ adds complexity to the PC update logic!

CM 69 W4 Section Slide Set 6 slide 7/9 BEQ instruction format and behaviour Instruction format: Behaviour: 26 2 2 2 6 source GPR source GPR 2 offset if source GPRs are equal PC = (PC + 4) + 4 sign-extended offset else PC = PC + 4 We already have a datapath to compute PC + 4 We ll need to add features to get (PC + 4) + 4 sign-extended offset CM 69 W4 Section Slide Set 6 slide 74/9 Datapath for BEQ instruction: design ideas We can multiply by 4 by doing a shift-left-2 (<< 2) of the Sign Extend output Q: How can we use the to decide whether or not a branch should be taken? Q2: What does that say about the values of Src and Control for a BEQ instruction? Q: If a signal called Branch is for BEQ, but for LW, SW and R-type instructions, how can we use that signal to ensure correct PC updates for BEQ and all the other instructions? These ideas lead to the schematic on the next slide CM 69 W4 Section Slide Set 6 slide 7/9 Datapath for BEQ instruction: schematic CM 69 W4 Section Slide Set 6 slide 76/9 Datapath for BEQ instruction: mux for input to PC PC PC 4 + Instr 2:2 Instr 2:6 A A WD Branch Control Src WE RD << 2 Instr : Sign Extend 6 + To understand how BEQ is handled it may help to zoom in on a small but critical part of the schematic from the previous slide:??? PC PC For each of the mux inputs, let s write a brief but precise description CM 69 W4 Section Slide Set 6 slide 77/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 78/9 A review of definitions for datapath and control Datapath: A collection of circuit elements, connected in a way that will generate a result for some category of instructions For example, the datapath for an LW instruction will include PC, I-Mem, R-, D-Mem and a few other important elements Control: A circuit element designed to send signals to datapath elements, to tell those elements what to do and sometimes when to do it

CM 69 W4 Section Slide Set 6 slide 79/9 Single-cycle control: Inputs and outputs () The control unit for our single-cycle processor will be combinational logic The inputs to the control unit describe what kind of instruction is being executed Which bits from the current instruction must be supplied as inputs to the control unit? The outputs of the control unit will be six -bit signals MemtoReg, MemWrite, Branch, Src, RegDst, and RegWrite and one -bit signal Control Let s make some notes about all of the control unit outputs CM 69 W4 Section Slide Set 6 slide 8/9 Single-cycle control: Split into two parts Instr :26 Instr : Main Decoder 2 Op Decoder MemtoReg MemWrite Branch Src RegDst RegWrite Control Let s write some rules for the 2-bit Op signal What are the dimensions for each part, if each of the two parts is a ROM? CM 69 W4 Section Slide Set 6 slide 8/9 Single-cycle control: Inputs and outputs (2) Control Unit MemtoReg MemWrite Branch Instr :26 opcode Control Instr : funct Src RegDst RegWrite If we implemented this as a ROM circuit, what would be the dimensions of the ROM array? (See textbook Section 6 to review ROMs) CM 69 W4 Section Slide Set 6 slide 82/9 SLT $t, $t, $zero machine code is Instr :26 Instr : Main Decoder 2 Op Decoder MemtoReg MemWrite Branch Src RegDst RegWrite Control For this example SLT instruction, what does the Main Decoder do? What does the Decoder do? CM 69 W4 Section Slide Set 6 slide 8/9 LW $s, x24($s) machine code is CM 69 W4 Section Slide Set 6 slide 84/9 BEQ $t9, $zero, [6 instructions back] machine code is Instr :26 Main Decoder 2 Op MemtoReg MemWrite Branch Src RegDst RegWrite For this example LW instruction, what does the Main Decoder do? What does the Decoder do? Instr :26 Main Decoder 2 Op MemtoReg MemWrite Branch Src RegDst RegWrite For this example BEQ instruction, what does the Main Decoder do? What does the Decoder do? Instr : Decoder Control Instr : Decoder Control

CM 69 W4 Section Slide Set 6 slide 8/9 Complete specification for Main Decoder within the Control Unit of the Figure 7 computer Instruction RegWrite RegDst Src Branch MemWrite MemtoReg Op R-type LW SW X X BEQ X X Exercise: Make a blank version of this table, then fill it in by looking at Figure 7 and deciding what all the signal values should be CM 69 W4 Section Slide Set 6 slide 86/9 Textbook Section 72: Performance Analysis Sketch of timing for LW $s, x24($s) slide 87/9 PC output Instruction main decoder outputs R- outputs decoder outputs result D-Mem RD output $s contents PC output Instruction main decoder outputs R- outputs decoder outputs result D-Mem RD output $s contents 2 4 6 7 CM 69 W4 Section Slide Set 6 slide 89/9 What happens as we adjust the clock period? Which clock speeds work for LW, and which ones do not? fast clock slow clock medium clock 2 4 6 2 4 6 2 4 6 CM 69 W4 Section Slide Set 6 slide 9/9 Detailed timing analysis for the single-cycle machine To study this material, it may be useful to review textbook Sections 29 and on timing We re going to follow the notation and presentation of textbook Section 74, and make the following assumptions: reading the R- (t RFread ) takes longer than sign-extend and a mux combined (as stated in the textbook); reading the R- (t RFread ) takes longer than generating Control Unit outputs (assumed but not actually stated) We ll look at the critical path for an LW instruction

CM 69 W4 Section Slide Set 6 slide 9/9 After a rising clock, there is a delay of up to t pcq PC (PC clock-to-q propagation delay) until the PC output is ready Once the PC is ready, the critical path for LW will run through units: I-Mem, R-,, D-Mem, and the mux controlled by MemtoReg The R- and D-Mem act like combinational logic when they re read, so the overall propagation delay through the units is just the sum of individual delays: t mem + t RFread + t + t mem + t mux It s assumed that I-Mem and D-Mem have the same delay, t mem, so the overall combinational delay simplifies to CM 69 W4 Section Slide Set 6 slide 92/9 It s assumed that R- updates will work correctly if its WD (write data) input is ready no later than t RFsetup (R- setup time) in advance of a rising clock So for safe operation of an LW instruction: T C t RFsetup t pcq PC + 2t mem + t RFread + t + t mux T C t pcq PC + 2t mem + t RFread + t + t mux + t RFsetup If that isn t clear, please study Section 74 carefully Lab 7 will have an exercise or two to help get you comfortable with this kind of timing analysis 2t mem + t RFread + t + t mux CM 69 W4 Section Slide Set 6 slide 9/9 Textbook Section 72: Performance Analysis CM 69 W4 Section Slide Set 6 slide 94/9 Textbook Section 7 looks at adding support for ADDI and J instructions to the single-cycle design We won t spend lecture time on that, but we ll look at supporting ADDI, J, and perhaps some other instructions, in Lab 7 or 8 CM 69 W4 Section Slide Set 6 slide 9/9 Moving on What determines the minimum safe clock period in the single-cycle machine? We ve just seen that, roughly speaking, it s the SUM of the response times of several datapath elements Idea: Could there be a different design that does not require such a long cascade of events in a single clock cycle? Could that allow a much shorter clock period?