A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7

Similar documents
Fill-in the following to understand stalling needs and forwarding opportunities

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

Instruction Level Parallelism

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

ASIC = Application specific integrated circuit

On the Rules of Low-Power Design

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Pipeline design. Mehran Rezaei

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm

Chapter 8. The MAP Circuit Discussion. The MAP Circuit 53

Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

T 2 : WR = 0, AD 7 -AD 0 (μp Internal Reg.) T 3 : WR = 1,, M(AB) AD 7 -AD 0 or BDB

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Spring 2017 EE 3613: Computer Organization Chapter 5: The Processor: Datapath & Control - 1

Logic Design Viva Question Bank Compiled By Channveer Patil

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach

SEMESTER ONE EXAMINATIONS 2002

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

EECS 270 Group Homework 4 Due Friday. June half credit if turned in by June

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

Analogue Versus Digital [5 M]

Instruction Level Parallelism and Its. (Part II) ECE 154B

Page 1) 7 points Page 2) 16 points Page 3) 22 points Page 4) 21 points Page 5) 22 points Page 6) 12 points. TOTAL out of 100

Asynchronous (Ripple) Counters

First Name Last Name November 10, 2009 CS-343 Exam 2

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Experiment 8 Introduction to Latches and Flip-Flops and registers

Course Administration

6.3 Sequential Circuits (plus a few Combinational)

A VLIW Processor for Multimedia Applications

LAB #4 SEQUENTIAL LOGIC CIRCUIT

CS61C : Machine Structures

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Logic Devices for Interfacing, The 8085 MPU Lecture 4

A Low-cost, Radiation-Hardened Method for Pipeline Protection in Microprocessors

CSE 275 Digital Design Lab Lab 8 Serial Adder/Subtractor Penn State Erie, The Behrend College Fall Semester 2007 Number of Lab Periods: 2

WINTER 15 EXAMINATION Model Answer

Outcomes. Spiral 1 / Unit 6. Flip-Flops FLIP FLOPS AND REGISTERS. Flip-flops and Registers. Outputs only change once per clock period

CPS311 Lecture: Sequential Circuits

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi

CpE 442. Designing a Pipeline Processor (lect. II)

1. a) For the circuit shown in figure 1.1, draw a truth table showing the output Q for all combinations of inputs A, B and C. [4] Figure 1.

EE 367 Lab Part 1: Sequential Logic

DIGITAL REGISTERS. Serial Input Serial Output. Block Diagram. Operation

COMP2611: Computer Organization Building Sequential Logics with Logisim

Digital 1 Final Project Sequential Digital System - Slot Machine

Lab #10 Hexadecimal-to-Seven-Segment Decoder, 4-bit Adder-Subtractor and Shift Register. Fall 2017

Physics 120 Lab 10 (2018): Flip-flops and Registers

4.5 Pipelining. Pipelining is Natural!

Midterm Examination II

Scanned by CamScanner

CS3350B Computer Architecture Winter 2015

INC 253 Digital and electronics laboratory I

Collections of flip-flops with similar controls and logic

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Good Evening! Welcome!

Design of a Binary Number Lock (using schematic entry method) 1. Synopsis: 2. Description of the Circuit:

Lecture 7: Sequential Networks

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14

CS61C : Machine Structures

EECS 270 Midterm 1 Exam Closed book portion Winter 2017

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Handout 16. by Dr Sheikh Sharif Iqbal. Memory Interface Circuits 80x86 processors

Digital Circuits ECS 371

Decade Counters Mod-5 counter: Decade Counter:

Multiplexor (aka MUX) An example, yet VERY useful circuit!

Exercise 2: D-Type Flip-Flop

CSE 140 Exam #3 Tajana Simunic Rosing

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Modeling Latches and Flip-flops

(Refer Slide Time: 2:00)

Ryerson University Department of Electrical and Computer Engineering EES508 Digital Systems

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

UC Berkeley CS61C : Machine Structures

MC9211 Computer Organization

Lecture-47 INTEL 8085A INTERRUPT STRUCTURE

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

Switching Circuits & Logic Design, Fall Final Examination (1/13/2012, 3:30pm~5:20pm)

Today 3/8/11 Lecture 8 Sequential Logic, Clocks, and Displays

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

An automatic synchronous to asynchronous circuit convertor

Out-of-Order Execution

Chapter 4. Logic Design

CPE 200L LABORATORY 3: SEQUENTIAL LOGIC CIRCUITS UNIVERSITY OF NEVADA, LAS VEGAS GOALS: BACKGROUND: SR FLIP-FLOP/LATCH

Very Short Answer: (1) (1) Peak performance does or does not track observed performance.

Contents Circuits... 1

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Transcription:

EE457 Lab7 Questions page A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7 1. A. In which parts or subparts of Lab 7 does the STALL signal cause the entire pipeline to be stalled? (Part 1, Part 2, P3_SP1, P3_SP2, P3_SP3, P3_SP4) 1. B. In which parts or subparts of Lab 7 does the STALL signal cause only part of the pipeline to be stalled? (Part 1, Part 2, P3_SP1, P3_SP2, P3_SP3, P3_SP4) 1. C. Are both of the above stall signals initiated by RAW dependencies that cannot be solved by forwarding or is it that in some of the Lab 7 designs there are no RAW dependencies that cannot be solved by forwarding? If so, why are you stalling in those cases? 2. In our Lab 6, both HDU and HDU_Br are in the ID stage. Miss (Bruin/Trojan) says that the HDU [unlike the HDU_Br who serves the Branch instruction (that insists on receiving all forwarding help in ID stage)] can be moved to EX stage. 3. A. The STALL of Q#1.A. above happens in (RF/EX1/EX2/EX12/EX2WB) stage in our design. Note that the ID stage of the CPU pipeline is called RF stage here as there is no instruction decoding in Lab7 because of one hot coding of its opcodes. We just fetch the source registers here and hence it is called the Register Fetch (RF) stage. The stall cannot be moved to take place in any other stage. T / F If you answered False, state to which stage (or stages) it can be moved to. If you answered True, state why it cannot be moved. 3. B. The STALL of Q#1.B. above happens in (RF/EX1/EX2/EX12/EX2WB) stage in our design. The stall cannot be moved to take place in any other stage. T / F / It depends Explain. 4. Unlike in our Lab 6, there is no need to have a Wrist Band Flip Flop (WB_FF) in any part of the Lab 7. T / F Explain your answer.

Please look at the solution only after you mentally answered all the questions.

EE457 Lab7 Answers page 1. A. In which parts or subparts of Lab 7 does the STALL signal cause the entire pipeline to be stalled? (Part 1, Part 2, P3_SP1, P3_SP2, P3_SP3, P3_SP4) Note: In part 2 we assumed that the clock is wide enough to combine EX2 and WB into EX2WB. 1. B. In which parts or subparts of Lab 7 does the STALL signal cause only part of the pipeline to be stalled? (Part 1, Part 2, P3_SP1, P3_SP2, P3_SP3, P3_SP4) 1. C. Are both of the above stall signals initiated by RAW dependencies that cannot be solved by forwarding or is it that in some of the Lab 7 designs there are no RAW dependencies that cannot be solved by forwarding? If so, why are you stalling in those cases? In P3_SP2 and P3_SP4, there are no RAW dependencies that cannot be solved by forwarding. We are stalling the entire pipeline for one clock when ADD1 comes into EX12 stage to allow it one clock extra time to do both SUB3 and ADD4 operations. Entire pipe needs to be stalled as we do not want the instruction in WB stage which could be helping the instruction in EX12 to leave the WB stage. Some students think that the help from the WB stage is needed only in the first clock when we subtract 4 from the source register (which may be dependent on the instruction in WB stage). They think that the addition of 4 to the result produced by the SUB3 does not need the help from the WB anymore. The fact is that the SUB3 is a combinational logic with some very quick paths and some very slow paths. If the WB instruction is allowed to leave, the help vanishes in the 2 nd clock and the garbage help (inappropriate help or no help) from the WB stage causes the SUB3 output to become invalid quickly through the short paths in the SUB3. The output pins of SUB3 (or for that matter any piece of combinational logic) cannot hold a data for a clock. So it is necessary that the senior in WB is stalled whenever EX12 is stalled. 2. In our Lab 6, both HDU and HDU_Br are in the ID stage. Miss (Bruin/Trojan) says that the HDU [unlike the HDU_Br who serves the Branch instruction (that insists on receiving all forwarding help in ID stage)] can be moved to EX stage. A multi source instruction such as the subu below dependent on the two seniors for its sources cannot be stalled in the EX stage as it cannot hold on the $3 help from Senior #2 (addu) until it received the $2 help from Senior #1 (lw) in the next clock. addu $3, $3, $3 lw $2, 40($8) subu $1, $2, $3

3. A. The STALL of Q#1.A. above happens in (RF/EX1/EX2/EX12/EX2WB) stage in our design. Note that the ID stage of the CPU pipeline is called RF stage here as there is no instruction decoding in Lab7 because of one hot coding of its opcodes. We just fetch the source registers here and hence it is called the Register Fetch (RF) stage. The stall cannot be moved to take place in any other stage. T / F If you answered False, state to which stage (or stages) it can be moved to. If you answered True, state why it cannot be moved. The ADD1 instruction needs extra time in EX12 but not in other stages. 3. B. The STALL of Q#1.B. above happens in (RF/EX1/EX2/EX12/EX2WB) stage in our design. The stall cannot be moved to take place in any other stage. T / F / It depends Explain. Lab 7 Parts 1 and 2 use instructions with multiple sources. Hence the stall should happen in the RF (the ID) stage only. It is for the same reason stated in the answer to question #2 above. But in the case of Lab 7 Part 3 Subparts #1 and #3, with a single source register, the stall can be made to happen in the EX1 stage instead of the RF (the ID) stage. In the case of a single source instruction, the question of receiving part of the help and holding on to it while waiting for the other part of the help does not arise. Performance wise, it does not matter whether you stall the dependent instruction in the RF (ID) stage or the EX1 stage. 4. Unlike in our Lab 6, there is no need to have a Wrist Band Flip Flop (WB_FF) in any part of the Lab 7. T / F Explain your answer. Unlike in the lab 6 IF/ID stage register, in all parts of lab #7 the IF/RF (the IF/ID) stage register has onehot control bits as the opcode itself is one hot coded and the 0000 opcode means a NOP. So just clearing the IF/RF register on reset will ensure that the RF stage has a bubble to start with. There is no need for a WB_FF. Some more detailed discussion about WB_FF in Lab 6 design: During system reset period, we want to make sure that all stages except for the IF stage are filled with bubbles. In Lab 6, the IF/ID stage register

has no control signals to clear using the /RESET signal. Hence we had to attach a Wrist Band FF (WB_FF). We either set or reset the WB_FF to mark the instruction in the ID stage as destined to be treated as a NOP (Bubble). Note that there is no opcode assigned for a NOP instruction in MIPS ISA. The 000000 opcode in MIPS is an R Type opcode. So clearing the IF/ID may or may not create a bubble. Actually it creates a bubble because it is an R type instruction with $0 as its destination. Extract from http://web.cse.ohio state.edu/~crawfis/cse675 02/Slides/MIPS%20Instruction%20Set.pdf However imagine the 7 stage pipeline where we have IF1/IF2 stage register besides IF2/ID stage register. Here we need a WB_FF at least associated with the IF1/IF2. So in EE457 we introduced the idea of WB_FF starting from the 5 stage pipeline itself.