CpE 442. Designing a Pipeline Processor (lect. II)
|
|
- Moses Marshall
- 6 years ago
- Views:
Transcription
1 CpE 442 Designing a Pipeline Pocesso (lect. II) CPE 442 hazads.1
2 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.2
3 Review: Single Cycle, ltiple Cycle, vs. Pipeline Clk Cycle 1 Cycle 2 Single Cycle Implementation: Load Stoe Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk ltiple Cycle Implementation: Load Ifetch Reg Eec em W Stoe Ifetch Reg Eec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Eec em W Stoe Ifetch Reg Eec em W R-type Ifetch Reg Eec em W CPE 442 hazads.3
4 Review: A Pipelined Datapath Clk Ifetch Reg/Dec Eec em W RegW EtOp Op Banch PC 1 0 PC+4 AIUnit I IF/ID Registe PC+4 Imm16 Rs Rt Ra Rb RFile Rt Rw Di Rd ID/E Registe 0 1 PC+4 Imm16 bsa bsb Eec Unit E/em Registe Zeo Data e m RA Do WA Di em/w Registe 1 0 RegDst Sc emw emtoreg CPE 442 hazads.4
5 Review: Pipeline Contol Data Stationay Contol The ain Contol geneates the contol signals ding Reg/Dec Contol signals fo Eec (EtOp, Sc,...) ae sed 1 cycle late Contol signals fo em (emw Banch) ae sed 2 cycles late Contol signals fo W (emtoreg emw) ae sed 3 cycles late Reg/Dec Eec em W EtOp EtOp Sc Sc IF/ID Registe ain Contol Op RegDst emw Banch emtoreg ID/E Registe Op RegDst emw Banch emtoreg E/em Registe emw Banch emtoreg em/w Registe emtoreg RegW RegW RegW RegW CPE 442 hazads.5
6 Review: Pipeline Smmay Pipeline Pocesso: Natal enhancement of the mltiple clock cycle pocesso Each fnctional nit can only be sed once pe instction If a instction is going to se a fnctional nit: - it mst se it at the same stage as all othe instctions Pipeline Contol: - Each stage s contol signal depends ONLY on the instction that is cently in that stage CPE 442 hazads.6
7 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.7
8 Intodction to Hazads Limits to pipelining: Hazads pevent net instction fom eecting ding its designated clock cycle stctal hazads: HW cannot sppot this combination of instctions data hazads: instction depends on eslt of pio instction still in the pipeline contol hazads: pipelining of banches & othe instctionscommon soltion is to stall the pipeline ntil the hazadbbbles in the pipeline CPE 442 hazads.8
9 A Single emoy is a Stctal Hazad Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 em Reg em Reg em Reg em Reg em em Reg em Reg Reg em Reg em Reg em Reg CPE 442 hazads.9
10 Option 1: Stall to esolve emoy Stctal Hazad Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3(stall) Inst 4 em Reg em Reg em Reg em Reg em Reg em Reg bbble em Reg em Reg em Reg em Reg CPE 442 hazads.10
11 Option 2: Dplicate to Resolve Stctal Hazad Sepaate Instction Cache (Im) & Data Cache (Dm) Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.11
12 Data Hazad on 1 add 1,2,3 sb 4, 1,3 and 6, 1,7 o 8, 1,9 o 10, 1,11 CPE 442 hazads.12
13 Data Hazad on 1: (Fige 6.30, page 397, P&H) Dependencies backwads in time ae hazads I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4,1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.13
14 Option1: HW Stalls to Resolve Data Hazad Dependencies backwads in time ae hazads I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4, 1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im bbble bbble bbble Reg Dm Reg Im Reg Dm Im Reg Im Reg CPE 442 hazads.14
15 Bt ecall se of Data Stationay Contol The ain Contol geneates the contol signals ding Reg/Dec Contol signals fo Eec (EtOp, Sc,...) ae sed 1 cycle late Contol signals fo em (emw Banch) ae sed 2 cycles late Contol signals fo W (emtoreg emw) ae sed 3 cycles late Reg/Dec Eec em W EtOp EtOp Sc Sc IF/ID Registe ain Contol Op RegDst emw Banch emtoreg ID/E Registe Op RegDst emw Banch emtoreg E/em Registe emw Banch emtoreg em/w Registe emtoreg RegW RegW RegW RegW CPE 442 hazads.15
16 Option 1: How HW eally stalls pipeline I n s t. O d e HW doesn t change PC => keeps fetching same instction & sets contol signals to to benign vales (0) Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 stall stall stall sb 4,1,3 Im bbble bbble bbble bbble Im bbble bbble bbble bbble Im bbble bbble bbble bbble Im Reg Dm Reg and 6,1,7 Im Reg Dm CPE 442 hazads.16
17 Option 2: SW insets indepdendent instctions Wost case insets NOP instctions I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 nop nop nop sb 4,1,3 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg and 6,1,7 Im Reg Dm CPE 442 hazads.17
18 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.18
19 Option 3 Insight: Data is available! (Fige 6.35, page 415, P&H) Pipeline egistes aleady contain needed data I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4,1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.19
20 HW Change fo Fowading (Bypassing): Incease mltipleos to add paths fom pipeline egistes Assmes egiste ead ding wite gets new vale (othewise moe eslts to be fowaded) ID/EX EX/E E/WB Zeo? Data emoy CPE 442 hazads.20
21 Complete data Path with Hazad detection and Fowading Fige 6.41 in the tet IF.Flsh Hazad detection nit ID/EX WB EX/E Contol 0 WB E/WB IF/ID EX WB PC 4 Instction memoy Shift left 2 Registes = Data memoy Sign etend Fowading nit CPE 442 hazads.21
22 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.22
23 Clock Fom Last Lecte: The Delay Load Phenomenon Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 I0: Load Ifetch Reg/Dec Eec em W Pls 1 Ifetch Reg/Dec Eec em W Pls 2 Ifetch Reg/Dec Eec em W Pls 3 Ifetch Reg/Dec Eec em W Pls 4 Ifetch Reg/Dec Eec em W Althogh Load is fetched ding Cycle 1: The data is NOT witten into the Reg File ntil the end of Cycle 5 We cannot ead this vale fom the Reg File ntil Cycle 6 3-instction delay befoe the load take effect CPE 442 hazads.23
24 Fowading edces Data Hazad to 1 cycle: (Fige 6.47, page 420 P&H) I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) sb 4,1,6 and 6,1,7 o 8,1,9 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.24
25 Option1: HW Stalls to Resolve Data Hazad Intelock : checks fo hazad & stalls I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) stall sb 4,1,3 and 6,1,7 o 8,1,9 Im bbble bbble bbble bbble Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.25
26 Option 2: SW insets independent instctions I n s t. O d e Wost case insets NOP instctions IPS I soltion: No HW checking Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) nop sb 4,1,3 and 6,1,7 o 8,1,9 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.26
27 Softwae Schedling to Avoid Load Hazads Ty podcing fast code fo a = b + c; d = e f; assming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd CPE 442 hazads.27
28 Softwae Schedling to Avoid Load Hazads Ty podcing fast code fo a = b + c; d = e f; assming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd Fast code: LW LW LW ADD LW SW SUB SW Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf d,rd CPE 442 hazads.28
29 Slow code: Fast code: LW LW ADD SW LW LW SUB SW Rb,b Rc,c Ra,Rb,Rc a,ra Re,e Rf,f Rd,Re,Rf d,rd LW LW LW ADD LW SW SUB SW Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf d,rd CPE 442 hazads.29
30 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay What makes pipelining had Smmay (5 mintes) CPE 442 hazads.30
31 Fom Last Lecte: The Delay Banch Phenomenon Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11 Clk 12: Beq Ifetch Reg/Dec Eec em W (taget is 1000) 16: R-type Ifetch Reg/Dec Eec em W 20: R-type Ifetch Reg/Dec Eec em W 24: R-type Ifetch Reg/Dec Eec em W 1000: Taget of B Ifetch Reg/Dec Eec em W Althogh Beq is fetched ding Cycle 4: Taget addess is NOT witten into the PC ntil the end of Cycle 7 Banch s taget is NOT fetched ntil Cycle 8 3-instction delay befoe the banch take effect CPE 442 hazads.31
32 Contol Hazad on Banches: 3 stage stall Time (in Clock Cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Pogam Eection Ode (in Instctions) 40 beq $1,$3,36 44 and $12,$2,$5 48 o $13,$6,$2 I Reg I Reg I D Reg Reg D Reg D Reg 52 add $14,$2,$2 I Reg D Reg 80 ld $4,$7,100 I Reg D Reg CPE 442 hazads.32
33 Banch Stall Impact If CPI = 1, 30% banch, Stall 3 cycles => new CPI = 1.9! 2 pat soltion: Detemine banch taken o not soone, AND Compte taken banch addess ealie Soltion Option 1: ove Zeo test to ID/RF stage Adde to calclate new PC in ID/RF stage 1 clock cycle penalty fo banch vs. 3 CPE 442 hazads.33
34 Option 1: move HW fowad to edce banch delay Data Path befoe change Instction Fetch IF/ID Inst. Decode Reg. Fetch ID/EX Eecte Add. Calc. EX/E emo y Access E/WB Wite Back 4 ADD Zeo? IR PC Instction emoy IR IR E/WB.IR Registes Data emoy Sign etend CPE 442 hazads.34
35 Banch Delay now 1 clock cycle Data Path afte change Instction Fetch Inst. Decode Reg. Fetch Eecte Add. Calc. emoy Access Wite Back IF/ID ID/EX EX/E E/WB ADD Zeo? ADD 4 IR PC Instction emoy IR IR E/WB.IR Registes Data emoy Sign etend CPE 442 hazads.35
36 Option 2: No Stalls, Define Banch as Delayed, inset instction afte the banch and allow it to eecte always, Wost case, SW insets NOP into banch delay if no instction can be fond Whee to get instctions to fill banch delay slot? Befoe banch instction, eample sw 1,0(2); beqd 0,2,T change to, beqd 0,2,T; sw 1,0(2) Fom the taget addess: only valable when banch Fom fall thogh: only valable when don t banch Compile effectiveness fo single banch delay slot: Fills abot 60% of banch delay slots Abot 80% of instctions eected in banch delay slots sefl in comptation abot 50% (60% 80%) of slots seflly filled CPE 442 hazads.36
37 Complete data Path with Hazad detection and Fowading Fige 6.41 in the tet IF.Flsh Hazad detection nit ID/EX WB EX/E Contol 0 WB E/WB IF/ID EX WB PC 4 Instction memoy Shift left 2 Registes = Data memoy Sign etend Fowading nit CPE 442 hazads.37
38 Eample Tet Fige 6.52 and $12, $2, $5 beq $1, $3, 7 sb $10, $4, $8 IF.Flsh IF/ID Contol Hazad detection nit ID/EX WB EX EX/E WB befoe<1> E/WB WB befoe<2> 4 Instction PC memoy Shift left 2 7 Registes = $1 $3 $4 $8 Data memoy Sign etend 10 Fowading nit Clock 3 lw $4, 50($7) bbble (nop) beq $1, $3, 7 sb $10,... befoe<1> IF.Flsh 76 Hazad detection nit ID/EX WB EX/E IF/ID Contol 0 EX WB E/WB WB PC Instction memoy Shift left 2 Registes = $1 $3 Data memoy Sign etend 10 Fowading nit CPE 442 hazads.38 Clock 4
39 Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.39
40 When is pipelining had? Intepts: 5 instctions eecting in 5 stage pipeline How to stop the pipeline? Restat? Who cased the intept? Stage Poblem intepts occing IF Page falt on instction fetch; misaligned memoy access; memoy-potection violation ID Undefined o illegal opcode EX Aithmetic intept E Page falt on data fetch; misaligned memoy access; memoy-potection violation Load with data page falt, Add with instction page falt? Soltion 1: intept vecto/instction 2: intept ASAP, estat eveything incomplete CPE 442 hazads.40
41 Data path with Eception Handling, Tet Fige 6.55, add a Case egiste, an Eception PC, and constant add. of Eception Handeling otine IF.Flsh ID.Flsh EX.Flsh IF/ID Hazad detection nit Contol 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB PC 4 Instction memoy Shift left 2 Registes = Ecept PC Data memoy Sign etend Fowading nit CPE 442 hazads.41
42 Review: Smmay of Pipelining Basics Speed Up Š Pipeline Depth (nmbe of stages); if ideal CPI is 1, then: Speedp = Pipeline depth 1 Pipeline stall cycles pe instction Clock cycle npipelined Clock cycle pipelined Hazads limit pefomance on comptes: stctal: need moe HW esoces data: need fowading, compile schedling contol: ealy evalation & PC, delayed banch, pediction Inceasing length of pipe inceases impact of hazads since pipelining helps instction bandwidth, not latency Compiles key to edcing cost of data and contol hazads load delay slots banch delay slots Eceptions, Instction Set, FP makes pipelining hade Longe pipelines => Banch pediction, moe instction paallelism? CPE 442 hazads.42
4.5 Pipelining. Pipelining is Natural!
4.5 Pipelining Ovelapped execution of instuctions Instuction level paallelism (concuency) Example pipeline: assembly line ( T Fod) Response time fo any instuction is the same Instuction thoughput inceases
More informationPipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.
Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction
More informationChapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs
EGC442 Introdction to Compter Architectre Chapter 4 (Part I) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.ed Introdction CPU performance factors Instrction cont Determined
More informationReview: What is it? What does it do? slti $4, $5, 6
Review: What is it? What does it do? Reg Src Instrction Instrction [3-] I [25-2] I [2-6] I [5 - ] 2 Src Op Reslt em em emtoreg I [5 - ] etend slti $, $5, 6 Reg Src Instrction Instrction [3-] I [25-2] I
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei Shift Left 2 pc Opcode ExtOp Cont Unit RegDst Addr Addr2 Addr npcsle Reg ALUSrc Mem 2 OVF Branch ALUCtr MemtoReg Mem Funct Extension ALUOp ALU Cont Shift Left 2 ID EXE MEM
More information06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.
06 1 MIPS Implementation 06 1 Material from Chapter 3 of H&P (for DLX). Material from Chapter 6 of P&H (for MIPS). line: (In this set.) Unpipelined DLX Implementation. (Diagram only.) Pipelined DLX and
More informationInstruction Level Parallelism
Instruction Level Parallelism Pipelining, Hazards Appendix C, HPe Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Pipelining
More informationPIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS
PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission
More informationEECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices
EECS150 - Digital Design Lecture 9 - CPU Microarchitecture Feb 17, 2009 John Wawrzynek Spring 2009 EECS150 - Lec9-cpu Page 1 CMOS Devices Review: Transistor switch-level models The gate acts like a capacitor.
More informationCS 152 Midterm 2 May 2, 2002 Bob Brodersen
CS 152 Midterm 2 May 2, 2002 Bob Brodersen Name Solutions Show your work if you want partial credit! Try all the problems, don t get stuck on one of them. Each one is worth 10 points. 1) 2) 3) 4) 5) 6)
More informationA few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7
EE457 Lab7 Questions page A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7 1. A. In which parts or subparts of Lab 7 does the STALL signal cause the
More informationFill-in the following to understand stalling needs and forwarding opportunities
Fill-in the following to understand stalling needs and forwarding opportunities Instruction ADD4 ADD Receiving forwarding help Providing forwarding help Insists on Doesn t mind Doesn t mind Capable of
More informationOutline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.
Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4
More informationFundamentals of Computer Systems
Fundamentals of Computer Systems A Pipelined MIPS Processor Stephen A. Edwards Columbia University Summer 25 Technical Illustrations Copyright c 27 Elsevier Sequential Laundry Time Alice Bob Cindy Pipelined
More informationContents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7
CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary
More informationContent-Based Movie Recommendation Using Different Feature Sets
Poceedings of the Wold Congess on Engineeing and Compte Science 202 Vol, Octobe 24-26, 202, San Fancisco, USA Content-Based Movie Recommendation Using Diffeent Feate Sets Mahiye Ulyagm, Zeha Cataltepe
More informationSlide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 6 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationTomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91
Tomasulo Algorithm Developed at IBM and first implemented in IBM s 360/91 IBM wanted to use the existing compiler instead of a specialized compiler for high end machines. Tracks when operands are available
More informationSlide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide
More informationASIC = Application specific integrated circuit
ASIC = Application specific integrated circuit CS 2630 Computer Organization Meeting 19: Building a MIPS processor Brandon Myers University of Iowa The goal: implement most of MIPS So far Implementing
More informationLecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach
Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu
More informationOn the Rules of Low-Power Design
On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv
More informationDynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi
Dynamic Scheduling (or out-of-order execution) Dynamic Scheduling Or ydanicm ceshuldngi CDC 6600 scoreboard Instruction storage added to each functional execution unit Instructions issue to FU when no
More informationInstruction Level Parallelism and Its. (Part II) ECE 154B
Instruction Level Parallelism and Its Exploitation (Part II) ECE 154B Dmitri Strukov ILP techniques not covered last week this week next week Scoreboard Technique Review Allow for out of order execution
More informationEEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling)
1 EEC 581 Computer Architecture Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview of Chap. 3 (again) Pipelined
More informationA VLIW Processor for Multimedia Applications
A VLIW Processor for Multimedia Applications E. Holmann T. Yoshida A. Yamada Y. Shimazu Mitsubishi Electric Corporation, System LSI Laboratory 4-1 Mizuhara, Itami, Hyogo 664, Japan Outline Objective System
More informationDifferences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components
Another Dynamic Algorithm: Tomasulo Algorithm Differences between Tomasulo Algorithm & Scoreboard For IBM 360/9 about 3 years after CDC 6600 Goal: High Performance without special compilers Differences
More informationCS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm
CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm 2003-10-23 Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L17 Adv.
More informationCS3350B Computer Architecture Winter 2015
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev
More informationBubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction
1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu
More informationA Reconfigurable Frame Interpolation Hardware Architecture for High Definition Video
A Reconfiguable Fame Intepolation Hadwae Achitectue fo High Definition Video Ozgu Tasdizen and Ilke Hamzaoglu Faculty of Engineeing and Natual Sciences, Sabanci Univesity 34956, Tuzla, Istanbul, Tukey
More informationGo BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C
CS6C L5 Intro to SDS, State Elements I () inst.eecs.berkeley.edu/~cs6c CS6C : Machine Structures Lecture #5 Intro to Synchronous Digital Systems, State Elements I 28-7-6 Go BEARS~ Albert Chae, Instructor
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 24 State Circuits : Circuits that Remember Senior Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Bio NAND gate Researchers at Imperial
More informationDYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO
DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationComputer and Digital System Architecture
Compter and Digital Sytem Architectre EE/CpE-517-A Brce McNair mcnair@teven.ed Steven Intitte of Technology - All right reerved 4-1/65 Week 4 ARM organization and implementation Frer Ch. 4 Steven Intitte
More informationModeling Digital Systems with Verilog
Modeling Digital Systems with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 6-1 Composition of Digital Systems Most digital systems can be partitioned into two types
More informationCS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.
CS 110 Computer Architecture Finite State Machines, Functional Units Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University
More informationOut-of-Order Execution
1 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulo s algorithm & reservation stations out-of-order completion leads to: imprecise
More informationOn the Design of LPM Address Generators Using Multiple LUT Cascades on FPGAs
Novembe 6, 006 1:58 Intenational Jounal of Electonics lpm IJE Intenational Jounal of Electonics Vol. **, No. **, ** 006, 1 18 On the Design of LPM Addess Geneatos Using Multiple LUT Cascades on FPGAs Hui
More informationInstruction Level Parallelism Part III
Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Dynamic Scheduling
More informationRAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION
RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION Shohaib Aboobacker TU München 22 nd March 2011 Based on Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation Dan
More informationRegister Transfer Level (RTL) Design Cont.
CSE4: Components and Design Techniques for Digital Systems Register Transfer Level (RTL) Design Cont. Tajana Simunic Rosing Where we are now What we are covering today: RTL design examples, RTL critical
More informationInstruction Level Parallelism Part III
Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Tomasulo Dynamic Scheduling
More informationDigital Design and Computer Architecture
Digital Design and Computer Architecture Lab 0: Multicycle Processor (Part ) Introduction In this lab and the next, you will design and build your own multicycle MIPS processor. You will be much more on
More informationAdvanced Pipelining and Instruction-Level Paralelism (2)
Advanced Pipelining and Instruction-Level Paralelism (2) Riferimenti bibliografici Computer architecture, a quantitative approach, Hennessy & Patterson: (Morgan Kaufmann eds.) Tomasulo s Algorithm For
More informationLab #10 Hexadecimal-to-Seven-Segment Decoder, 4-bit Adder-Subtractor and Shift Register. Fall 2017
University of Texas at El Paso Electrical and Computer Engineering Department EE 2169 Laboratory for Digital Systems Design I Lab #10 Hexadecimal-to-Seven-Segment Decoder, 4-bit Adder-Subtractor and Shift
More informationScoreboard Limitations!
Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue!! WAR hazard stall at write! Inf3 Computer Architecture - 2015-2016 1 Dynamic Scheduling
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 12: Dynamic Scheduling: Tomasulo s Algorithm Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS252, UC Berkeley
More informationSequential Elements con t Synchronous Digital Systems
ecture 15 Computer Science 61C Spring 2017 February 22th, 2017 Sequential Elements con t Synchronous Digital Systems 1 Administrivia I Good news: Waitlist students: You are in! Concurrent Enrollment students:
More informationThe game of competitive sorcery that will leave you spellbound.
A Game by Buce Basi The game of competitive socey that will leave you spellbound. 0 min 4+ 2- Toubles a-bewin! It s exam time at the School of Socey and the mischievous witches ae caft thei stongest potions
More informationAN ABSTRACT OF THE THESIS OF
AN ABSTRACT OF THE THESIS OF Licheng Zhang for the degree of Master of Science in Electrical and Computer Engineering presented on June 7, 1989. Title: The Design of A Reduced Instruction Set Computer
More informationC2 Vectors C3 Interactions transfer momentum. General Physics GP7-Vectors (Ch 4) 1
C2 Vectos C3 Inteactions tansfe momentum Geneal Phsics GP7-Vectos (Ch 4) 1 Solutions to HW When ou homewok is gaded and etuned, solutions will be available. Download PobViewe 1.4 www.phsics.pomona.edu/siideas/sicp.html
More informationVery Short Answer: (1) (1) Peak performance does or does not track observed performance.
Very Short Answer: (1) (1) Peak performance does or does not track observed performance. (2) (1) Which is more effective, dynamic or static branch prediction? (3) (1) Do benchmarks remain valid indefinitely?
More informationLogic Design ( Part 3) Sequential Logic (Chapter 3)
o Far: Combinational Logic Logic esign ( Part ) equential Logic (Chapter ) Based on slides McGraw-Hill Additional material 24/25/26 Lewis/Martin Additional material 28 oth Additional material 2 Taylor
More informationSlide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 9 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 9 slide
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember 2008-3-14 Scott Beamer, Guest Lecturer www.piday.org 3.14159265358979323 8462643383279502884
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev
More informationSequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14
Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14 Ziad Matni Dept. of Computer Science, UCSB Administrative Only 2.5 weeks left!!!!!!!! OMG!!!!! Th. 5/24 Sequential Logic
More informationReview C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14
CS61C L14 Introduction to Synchronous Digital Systems (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #14 Introduction to Synchronous Digital Systems 2007-7-18 Scott Beamer, Instructor
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #14 Introduction to Synchronous Digital Systems 2007-7-18 Scott Beamer, Instructor CS61C L14 Introduction to Synchronous Digital Systems
More informationSpiral Content Mapping. Spiral 2 1. Learning Outcomes DATAPATH COMPONENTS. Datapath Components: Counters Adders Design Example: Crosswalk Controller
-. -. piral Content Mapping piral Theory Combinational Design equential Design ystem Level Design Implementation and Tools Project piral Performance metrics (latency vs. throughput) Boolean Algebra Canonical
More informationScoreboard Limitations
Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue! WAR hazard stall at write Inf3 Computer Architecture - 2016-2017 1 Dynamic Scheduling
More informationH-DFT: A HYBRID DFT ARCHITECTURE FOR LOW-COST HIGH QUALITY STRUCTURAL TESTING
H-DFT: A HYBRID DFT ARCHITECTURE FOR LOW-COST HIGH QUALITY STRUCTURAL TESTING David M. Wu*, Mike Lin, Subhasish Mita, Kee Sup Kim, Anil Sabbavaapu, Talal Jabe, Pete Johnson, Dale Mach, Geg Paish Intel
More informationMusic Technology Advanced Subsidiary Unit 1: Music Technology Portfolio 1
Peason Edexcel GCE Music Technology Advanced Subsidiay Unit 1: Music Technology Potfolio 1 Release date: Tuesday 1 Septembe 2015 Time: 60 hous Pape Refeence 6MT01/01 You must have: A copy of the oiginal
More informationPrecision Interface Technology
Pecision Inteface Technology Phono Inteconnect Cables INTRODUCTION Signals fom catidges ae highly sensitive to hum, noise and vaious foms of intefeence. The connecting cable between the catidge and pe-amplifie
More informationEECS150 - Digital Design Lecture 3 - Timing
EECS150 - Digital Design Lecture 3 - Timing September 3, 2002 John Wawrzynek Fall 2002 EECS150 - Lec03-Timing Page 1 Outline Finish up from lecture 2 General Model of Synchronous Systems Performance Limits
More informationSequential logic circuits
Computer Mathematics Week 10 Sequential logic circuits College of Information Science and Engineering Ritsumeikan University last week combinational digital circuits signals and busses logic gates and,
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 1-Bus Architecture and Datapath 10262011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline 1-Bus Microarchitecture and
More informationEECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements
EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review September 1, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150
More informationA Low-cost, Radiation-Hardened Method for Pipeline Protection in Microprocessors
1 A Low-cost, Radiation-Hardened Method for Pipeline Protection in Microprocessors Yang Lin, Mark Zwolinski, Senior Member, IEEE, and Basel Halak Abstract The aggressive scaling of semiconductor technology
More informationEECS 373 Design of Microprocessor-Based Systems
EECS 373 Design of Microprocessor-Based Systems A day of Misc. Topics Mark Brehob University of Michigan Lecture 12: Finish up Analog and Digital converters Finish design rules Quick discussion of MMIO
More informationSequential Logic. Introduction to Computer Yung-Yu Chuang
Sequential Logic Introduction to Computer Yung-Yu Chuang with slides by Sedgewick & Wayne (introcs.cs.princeton.edu), Nisan & Schocken (www.nand2tetris.org) and Harris & Harris (DDCA) Review of Combinational
More informationECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs
ECEN454 igital Integrated Circuit esign Sequential Circuits ECEN 454 Combinational logic Sequencing Output depends on current inputs Sequential logic Output depends on current and previous inputs Requires
More informationI/O Interfacing. What we are going to learn in this session:
I/O Interfacing ECE 5: Digital System & Microprocessor What we are going to learn in this session: M6823 Parallel Interface Timer. egisters in the M6823. Port initialization method. How M6823 interfaces
More informationStudy on evaluation method of the pure tone for small fan
Study on evaluation method of the pue tone fo small fan Takao YAMAGUCHI 1 ; Gaku MINORIKAWA 2 ; Masayuki KIHARA 3 1, 2 Hosei Univesity, Japan 3 Shap Copoation, Japan ABSTRACT In the field of audio, visual
More informationStructural Fault Tolerance for SOC
Structural Fault Tolerance for SOC Soft Error Fault Tolerant Systems Hrushikesh Chavan Department of ECE, University of Wisconsin Madison, USA hchavan@wisc.edu Younggyun Cho Department of ECE, University
More informationCPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing
CPE/EE 427, CPE 527 VLSI esign I Sequential Circuits epartment of Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic ( www.ece.uah.edu/~milenka ) Combinational
More informationEE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1
EE 447/547 VLSI esign Lecture 9: Sequential Circuits Sequential circuits 1 Outline Floorplanning Sequencing Sequencing Element esign Max and Min-elay Clock Skew Time Borrowing Two-Phase Clocking Sequential
More informationMore Digital Circuits
More Digital Circuits 1 Signals and Waveforms: Showing Time & Grouping 2 Signals and Waveforms: Circuit Delay 2 3 4 5 3 10 0 1 5 13 4 6 3 Sample Debugging Waveform 4 Type of Circuits Synchronous Digital
More informationCompact Beamformer Design with High Frame Rate for Ultrasound Imaging
Sensos & Tansduces 2014 by IFSA Publishing, S. L. http://www.sensospotal.com Compact Beamfome Design with High Fame Rate fo Ultasound Imaging Jun Luo, Qijun Huang, Sheng Chang, Xiaoying Song, Hao Wang
More informationComputer Architecture Basic Computer Organization and Design
After the fetch and decode phase, PC contains 31, which is the address of the next instruction in the program (the return address). The register AR holds the effective address 170 [see figure 6.10(a)].
More informationRegisters. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers
Unit 2 Registers and Counters Fundamentals of Logic esign EE2369 Prof. Eric Maconald Fall Semester 23 Registers Groups of flip-flops Can contain data format can be unsigned, 2 s complement and other more
More informationA Low-Power 0.7-V H p Video Decoder
A Low-Power 0.7-V H.264 720p Video Decoder D. Finchelstein, V. Sze, M.E. Sinangil, Y. Koken, A.P. Chandrakasan A-SSCC 2008 Outline Motivation for low-power video decoders Low-power techniques pipelining
More informationLecture 10: Sequential Circuits
Introduction to CMOS VLSI esign Lecture 10: Sequential Circuits avid Harris Harvey Mudd College Spring 2004 1 Outline Floorplanning Sequencing Sequencing Element esign Max and Min-elay Clock Skew Time
More informationPerformance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques
Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR
More informationGrant Spacing Signaling at the ONU
Gant Spacing Signaling at the ONU Glen Kame, Boadcom Duane Remein, Huawei May 2018 IEEE 802.3ca Task Foce, ittsbugh, A 1 Total Bust Size In 802.3ca, the OLT GATE message conveys only the payload length
More informationCacheCompress A Novel Approach for Test Data Compression with cache for IP cores
CacheCompress A Novel Approach for Test Data Compression with cache for IP cores Hao Fang ( 方昊 ) fanghao@mprc.pku.edu.cn Rizhao, ICDFN 07 20/08/2007 To be appeared in ICCAD 07 Sections Introduction Our
More informationA Parallel Multilevel-Huffman Decompression Scheme for IP Cores with Multiple Scan Chains
A Parallel Mltilevel-Hffman Decompression Scheme for IP Cores with Mltiple Scan Chains X Kavosianos, E Kalligeros 2 and D Nikolos 2 Compter Science Dept, University of Ioannina, 45 Ioannina, Greece 2 Compter
More informationA QUERY BY HUMMING SYSTEM THAT LEARNS FROM EXPERIENCE
A QUERY BY HUMMING SYSTEM THAT LEARNS FROM EXPERIENCE David Little, David Raffenspege, Byan Pado EECS Depatment Nothwesten Univesity Evanston, IL 60201 d-little,d-affenspege,pado@nothwesten.edu ABSTRACT
More informationFirst Name Last Name November 10, 2009 CS-343 Exam 2
CS-343 Exam 2 Instructions: For multiple choice questions, circle the letter of the one best choice unless the question explicitly states that it might have multiple correct answers. There is no penalty
More informationCS 250 VLSI System Design
CS 250 VLSI System Design Lecture 3 Timing 2013-9-5 Professor Jonathan Bachrach today s lecture by John Lazzaro TA: Ben Keller www-insteecsberkeleyedu/~cs250/ 1 everything doesn t happen at once Timing,
More informationVector IRAM Memory Performance for Image Access Patterns Richard M. Fromm Report No. UCB/CSD-99-1067 October 1999 Computer Science Division (EECS) University of California Berkeley, California 94720 Vector
More information11. Sequential Elements
11. Sequential Elements Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 11, 2017 ECE Department, University of Texas at Austin
More informationEWCM 900. technical user manual. electronic controller for compressors and fans
EWCM 900 technical use manual electonic contolle fo compessos and fans Summay 1. INTRODUCTION...5 1.1. VERSIONS... 5 1.2. GENERAL CHARACTERISTICS... 5 2. USER INTERFACE...6 2.1. COMPRESSOR SECTION... 6
More informationSoftware Manual Control Panel for Professional Single Booster Units Models: MM3 BW3
Software Manual Control Panel for Professional Single Booster Units Models: MM3 BW3 EN Software Manual.. 1-14 1 1. DESCRIPTION 3 2. DISPLAY LAYOUT 4 3. MODES 5 3.1 Power On 5 3.2 Standby 5 3.3 Power off
More informationChapter. Sequential Circuits
Chapter Sequential Circuits Circuits Combinational circuit The output depends only on the input Sequential circuit Has a state The output depends not only on the input but also on the state the circuit
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 21 State Elements : Circuits that Remember 2007-03-07 Mocha sipping TA Valerie Ishida inst.eecs.berkeley.edu/~cs61c-td 161 Exabytes
More informationInvestigation on Technical Feasibility of Stronger RS FEC for 400GbE
Investigation on Technical Feasibility of Stronger RS FEC for 400GbE Mark Gustlin-Xilinx, Xinyuan Wang, Tongtong Wang-Huawei, Martin Langhammer-Altera, Gary Nicholl-Cisco, Dave Ofelt-Juniper, Bill Wilkie-Xilinx,
More information