Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Similar documents
Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Review: What is it? What does it do? slti $4, $5, 6

CpE 442. Designing a Pipeline Processor (lect. II)

Pipeline design. Mehran Rezaei

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

Instruction Level Parallelism

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

Digital Design and Computer Architecture

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

ASIC = Application specific integrated circuit

Instruction Level Parallelism and Its. (Part II) ECE 154B

Fundamentals of Computer Systems

A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7

Advanced Pipelining and Instruction-Level Paralelism (2)

Instruction Level Parallelism Part III

Fill-in the following to understand stalling needs and forwarding opportunities

Instruction Level Parallelism Part III

Out-of-Order Execution

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach

On the Rules of Low-Power Design

Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91

Very Short Answer: (1) (1) Peak performance does or does not track observed performance.

Modeling Digital Systems with Verilog

Computer Architecture Spring 2016

CPE300: Digital System Architecture and Design

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling)

Computer and Digital System Architecture

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Analog Signal Input. ! Note: B.1 Analog Connections. Programming for Analog Channels

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi

CS3350B Computer Architecture Winter 2015

Differences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm

First Name Last Name November 10, 2009 CS-343 Exam 2

A VLIW Processor for Multimedia Applications

Multiplexor (aka MUX) An example, yet VERY useful circuit!

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP

AN ABSTRACT OF THE THESIS OF

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Ryerson University Department of Electrical and Computer Engineering COE/BME 328 Digital Systems

Cast Away on the Letter A

Chapter 05: Basic Processing Units Control Unit Design Organization. Lesson 11: Multiple Bus Organisation

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

With Ease. BETTY WAGNER Associate Trinity College London, Associate Music Australia READING LEDGER LINE NOTES

CS61C : Machine Structures

Register Transfer Level (RTL) Design Cont.

4.5 Pipelining. Pipelining is Natural!

Introduction to CMOS VLSI Design (E158) Lab 3: Datapath and Zipper Assembly

Computer Architecture Basic Computer Organization and Design

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

A Buyers Guide to Laser Projection

Digital Design and Computer Architecture

Scoreboard Limitations!

Sequential Elements con t Synchronous Digital Systems

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

More Digital Circuits

Last time, we saw how latches can be used as memory in a circuit

11. Sequential Elements

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14

CS61C : Machine Structures

CS61C : Machine Structures

Features 1 Harris and other corners

Scoreboard Limitations

Music Theory Level 2. Name. Period

6.3 Sequential Circuits (plus a few Combinational)

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits

CMOS VLSI Design. Lab 3: Datapath and Zipper Assembly

Sequential Logic. Introduction to Computer Yung-Yu Chuang

Structural Fault Tolerance for SOC

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Design and Implementation of Timer, GPIO, and 7-segment Peripherals

In 2007, Pew Research conducted a survey to assess Americans knowledge of

ECSE-323 Digital System Design. Datapath/Controller Lecture #1

ECE337 Lab 4 Introduction to State Machines in VHDL

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

A Low-cost, Radiation-Hardened Method for Pipeline Protection in Microprocessors

Microprocessor Design

Montgomery Modular Exponentiation on Reconfigurable Hardware æ

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

Lab 2: Hardware/Software Co-design with the Wimp51

An Overview of FLEET CS-152

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

1. Basic safety information 4 2. Proper use 4

EXHIBITOR S PROSPECTUS

UC Berkeley CS61C : Machine Structures

Risk Risk Title Severity (1-10) Probability (0-100%) I FPGA Area II Timing III Input Distortion IV Synchronization 9 60

Introduction to Computer Engineering. CS/ECE 252, Spring 2017 Rahul Nayar Computer Sciences Department University of Wisconsin Madison

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

Transcription:

Chapter 6

Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction fetch ALU Data access lw $3, 3($) Program eection Time order (in instrctions) lw $, ($) lw $2, 2($) Instrction fetch 2 ns 8 ns 2 4 6 8 2 4 Instrction fetch ALU Data access ALU Data access Instrction fetch 8 ns... lw $3, 3($) 2 ns Instrction fetch ALU 2 ns 2 ns 2 ns 2 ns 2 ns Ideal speedp is nmber of stages in the pipeline. Do we achieve this? Data access Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 2

Pipelining What makes it easy all instrctions are the same length jst a few instrction formats operands appear only in loads and stores What makes it hard? strctral hazards: sppose we had only one control hazards: need to worry abot branch instrctions hazards: an instrction depends on a previos instrction Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 3

Pipelining We ll bild a simple pipeline and look at these isses We ll talk abot modern processors and what really makes it hard: eception handling trying to improve performance with ot-oforder eection, etc. Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 4

Basic Idea IF: Instrction fetch ID: Instrction decode/ register file read EX: Eecte/ address calclation E: emory access : back 4 reslt Shift left 2 PC ress Instrction Instrction register register 2 isters 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 What do we need to add to actally split the path into stages? Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 5

Pipelined Datapath IF/ID ID/EX EX/E E/ 4 reslt Shift left 2 PC ress Instrction Instrction register register 2 isters 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Can yo find a problem even if there are no dependencies? What instrctions can we eecte to manifest the problem? Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 6

Corrected Datapath IF/ID ID/EX EX/E E/ 4 reslt Shift left 2 PC ress Instrction Instrction register register 2 isters 2 register Zero ALU ALU reslt ress Data 6 Sign etend 32 Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 7

Graphically Representing Pipelines Program eection order (in instrctions) lw $, 2($) Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 I ALU D sb $, $2, $3 I ALU D Can help with answering qestions like: how many cycles does it take to eecte this code? what is the ALU doing dring cycle 4? se this representation to help nderstand paths Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 8

Pipeline Control PCSrc IF/ID ID/EX EX/E E/ 4 Shift left 2 reslt Branch PC ress Instrction Instrction register register 2 isters 2 register Instrction [5 ] 6 Sign 32 etend ALUSrc 6 ALU control Zero ALU ALU reslt ress em Data em emto Instrction [2 6] Instrction [5 ] ALUOp Dst Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 9

Pipeline control We have 5 stages. What needs to be controlled in each stage? Instrction Fetch and PC Increment Instrction Decode / ister Fetch Eection emory Stage Back Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY

Pipeline control How wold control be handled in an atomobile plant? a fancy control center telling everyone what to do? shold we se a finite state machine? Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY

Pipeline Control Pass control signals along jst like the Eection/ress Calclation stage control lines emory access stage control lines stage control lines Instrction Dst ALU Op ALU Op ALU Src Branch em em write em to R-format lw sw X X beq X X Instrction Control EX IF/ID ID/EX EX/E E/ Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 2

Datapath with Control PCSrc Control ID/EX EX/E E/ IF/ID EX PC 4 ress Instrction Instrction register register 2 isters 2 register Shift left 2 reslt ALUSrc Zero ALU ALU reslt Branch em ress Data emto Instrction 6 32 [5 ] Sign etend 6 ALU control em Instrction [2 6] Instrction [5 ] Dst ALUOp Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 3

Dependencies Problem with starting net instrction before first is finished dependencies that go backward in time are hazards Time (in clock cycles) Vale of register $2: Program eection order (in instrctions) sb $2, $, $3 CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 / 2 2 2 2 2 I D and $2, $2, $5 I D or $3, $6, $2 I D add $4, $2, $2 I D sw $5, ($2) I D Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 4

Software Soltion Have compiler garantee no hazards Where do we insert the nops? sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2) Problem: this really slows s down! Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 5

Forwarding Use temporary reslts, don t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Vale of register $2 : / 2 2 2 2 2 Vale of EX/E : X X X 2 X X X X X Vale of E/ : X X X X 2 X X X X Program eection order (in instrctions) sb $2, $, $3 I D and $2, $2, $5 I D or $3, $6, $2 I D add $4, $2, $2 I D sw $5, ($2) I D what if this $2 was $3? Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 6

Forwarding ID/EX EX/E Control E/ IF/ID EX PC Instrction Instrction isters ALU Data IF/ID.isterRs Rs IF/ID.isterRt Rt IF/ID.isterRt IF/ID.isterRd Rt Rd EX/E.isterRd Forwarding nit E/.isterRd Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 7

Can't always forward Load word can still case a hazard: an instrction tries to read a register following a load instrction that writes to the same register Program eection order (in instrctions) lw $2, 2($) Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 I D CC 7 CC 8 CC 9 and $4, $2, $5 I D or $8, $2, $6 I D add $9, $4, $2 I D slt $, $6, $7 I D Ths, we need a hazard detection nit to stall the load instrction Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 8

Stalling We can stall the pipeline by keeping an instrction in the same stage Program Time (in clock cycles) eection order (in instrctions) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC lw $2, 2($) I D and $4, $2, $5 I D or $8, $2, $6 add $9, $4, $2 I I D bbble I D slt $, $6, $7 I D Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 9

Hazard Detection Unit Stall by letting an instrction that won t write anything go forward Hazard detection nit ID/EX.em ID/EX IF/ID IF/ID Control EX EX/E E/ PC PC Instrction Instrction isters ALU Data IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd ID/EX.isterRt Rt Rd Rs Rt Forwarding nit EX/E.isterRd E/.isterRd Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 2

Branch Hazards When we decide to branch, other instrctions are in the pipeline! Program eection order (in instrctions) Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 4 beq $, $3, 7 I D 44 and $2, $2, $5 I D 48 or $3, $6, $2 I D 52 add $4, $2, $2 I D 72 lw $4, 5($7) I D We are predicting branch not taken need to add hardware for flshing instrctions if we are wrong Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 2

Flshing Instrctions IF.Flsh Hazard detection nit ID/EX EX/E Control E/ IF/ID EX PC 4 Instrction Shift left 2 isters = ALU Data Sign etend Forwarding nit Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 22

Improving Performance Try and avoid stalls! E.g., reorder these instrctions: lw $t, ($t) lw $t2, 4($t) sw $t2, ($t) sw $t, 4($t) Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 23

Improving Performance a branch delay slot the net instrction after a branch is always eected rely on compiler to fill the slot with something sefl Sperscalar: start more than one instrction in the same cycle Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 24

Dynamic Schedling The hardware performs the schedling hardware tries to find instrctions to eecte ot of order eection is possible speclative eection and dynamic branch prediction Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 25

Dynamic Schedling All modern processors are very complicated DEC Alpha 2264: 9 stage pipeline, 6 instrction isse PowerPC and Pentim: branch history table Compiler technology important Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 26

Dynamic Schedling This class has given yo the backgrond yo need to learn more Video: An Overview of Intel s Pentim Processor Electrical & Compter Engineering THE COLLEGE OF NEW JERSEY 27