CpE 442 Designing a Pipeline Pocesso (lect. II) CPE 442 hazads.1
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.2
Review: Single Cycle, ltiple Cycle, vs. Pipeline Clk Cycle 1 Cycle 2 Single Cycle Implementation: Load Stoe Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk ltiple Cycle Implementation: Load Ifetch Reg Eec em W Stoe Ifetch Reg Eec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Eec em W Stoe Ifetch Reg Eec em W R-type Ifetch Reg Eec em W CPE 442 hazads.3
Review: A Pipelined Datapath Clk Ifetch Reg/Dec Eec em W RegW EtOp Op Banch PC 1 0 PC+4 AIUnit I IF/ID Registe PC+4 Imm16 Rs Rt Ra Rb RFile Rt Rw Di Rd ID/E Registe 0 1 PC+4 Imm16 bsa bsb Eec Unit E/em Registe Zeo Data e m RA Do WA Di em/w Registe 1 0 RegDst Sc emw emtoreg CPE 442 hazads.4
Review: Pipeline Contol Data Stationay Contol The ain Contol geneates the contol signals ding Reg/Dec Contol signals fo Eec (EtOp, Sc,...) ae sed 1 cycle late Contol signals fo em (emw Banch) ae sed 2 cycles late Contol signals fo W (emtoreg emw) ae sed 3 cycles late Reg/Dec Eec em W EtOp EtOp Sc Sc IF/ID Registe ain Contol Op RegDst emw Banch emtoreg ID/E Registe Op RegDst emw Banch emtoreg E/em Registe emw Banch emtoreg em/w Registe emtoreg RegW RegW RegW RegW CPE 442 hazads.5
Review: Pipeline Smmay Pipeline Pocesso: Natal enhancement of the mltiple clock cycle pocesso Each fnctional nit can only be sed once pe instction If a instction is going to se a fnctional nit: - it mst se it at the same stage as all othe instctions Pipeline Contol: - Each stage s contol signal depends ONLY on the instction that is cently in that stage CPE 442 hazads.6
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.7
Intodction to Hazads Limits to pipelining: Hazads pevent net instction fom eecting ding its designated clock cycle stctal hazads: HW cannot sppot this combination of instctions data hazads: instction depends on eslt of pio instction still in the pipeline contol hazads: pipelining of banches & othe instctionscommon soltion is to stall the pipeline ntil the hazadbbbles in the pipeline CPE 442 hazads.8
A Single emoy is a Stctal Hazad Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 em Reg em Reg em Reg em Reg em em Reg em Reg Reg em Reg em Reg em Reg CPE 442 hazads.9
Option 1: Stall to esolve emoy Stctal Hazad Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3(stall) Inst 4 em Reg em Reg em Reg em Reg em Reg em Reg bbble em Reg em Reg em Reg em Reg CPE 442 hazads.10
Option 2: Dplicate to Resolve Stctal Hazad Sepaate Instction Cache (Im) & Data Cache (Dm) Time (clock cycles) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.11
Data Hazad on 1 add 1,2,3 sb 4, 1,3 and 6, 1,7 o 8, 1,9 o 10, 1,11 CPE 442 hazads.12
Data Hazad on 1: (Fige 6.30, page 397, P&H) Dependencies backwads in time ae hazads I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4,1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.13
Option1: HW Stalls to Resolve Data Hazad Dependencies backwads in time ae hazads I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4, 1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im bbble bbble bbble Reg Dm Reg Im Reg Dm Im Reg Im Reg CPE 442 hazads.14
Bt ecall se of Data Stationay Contol The ain Contol geneates the contol signals ding Reg/Dec Contol signals fo Eec (EtOp, Sc,...) ae sed 1 cycle late Contol signals fo em (emw Banch) ae sed 2 cycles late Contol signals fo W (emtoreg emw) ae sed 3 cycles late Reg/Dec Eec em W EtOp EtOp Sc Sc IF/ID Registe ain Contol Op RegDst emw Banch emtoreg ID/E Registe Op RegDst emw Banch emtoreg E/em Registe emw Banch emtoreg em/w Registe emtoreg RegW RegW RegW RegW CPE 442 hazads.15
Option 1: How HW eally stalls pipeline I n s t. O d e HW doesn t change PC => keeps fetching same instction & sets contol signals to to benign vales (0) Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 stall stall stall sb 4,1,3 Im bbble bbble bbble bbble Im bbble bbble bbble bbble Im bbble bbble bbble bbble Im Reg Dm Reg and 6,1,7 Im Reg Dm CPE 442 hazads.16
Option 2: SW insets indepdendent instctions Wost case insets NOP instctions I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 nop nop nop sb 4,1,3 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg and 6,1,7 Im Reg Dm CPE 442 hazads.17
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.18
Option 3 Insight: Data is available! (Fige 6.35, page 415, P&H) Pipeline egistes aleady contain needed data I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B add 1,2,3 sb 4,1,3 and 6,1,7 o 8,1,9 o 10,1,11 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.19
HW Change fo Fowading (Bypassing): Incease mltipleos to add paths fom pipeline egistes Assmes egiste ead ding wite gets new vale (othewise moe eslts to be fowaded) ID/EX EX/E E/WB Zeo? Data emoy CPE 442 hazads.20
Complete data Path with Hazad detection and Fowading Fige 6.41 in the tet IF.Flsh Hazad detection nit ID/EX WB EX/E Contol 0 WB E/WB IF/ID EX WB PC 4 Instction memoy Shift left 2 Registes = Data memoy Sign etend Fowading nit CPE 442 hazads.21
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.22
Clock Fom Last Lecte: The Delay Load Phenomenon Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 I0: Load Ifetch Reg/Dec Eec em W Pls 1 Ifetch Reg/Dec Eec em W Pls 2 Ifetch Reg/Dec Eec em W Pls 3 Ifetch Reg/Dec Eec em W Pls 4 Ifetch Reg/Dec Eec em W Althogh Load is fetched ding Cycle 1: The data is NOT witten into the Reg File ntil the end of Cycle 5 We cannot ead this vale fom the Reg File ntil Cycle 6 3-instction delay befoe the load take effect CPE 442 hazads.23
Fowading edces Data Hazad to 1 cycle: (Fige 6.47, page 420 P&H) I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) sb 4,1,6 and 6,1,7 o 8,1,9 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.24
Option1: HW Stalls to Resolve Data Hazad Intelock : checks fo hazad & stalls I n s t. O d e Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) stall sb 4,1,3 and 6,1,7 o 8,1,9 Im bbble bbble bbble bbble Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.25
Option 2: SW insets independent instctions I n s t. O d e Wost case insets NOP instctions IPS I soltion: No HW checking Time (clock cycles) I ID/R E E W Im F FReg X Dm Reg B lw 1, 0(2) nop sb 4,1,3 and 6,1,7 o 8,1,9 Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg Im Reg Dm Reg CPE 442 hazads.26
Softwae Schedling to Avoid Load Hazads Ty podcing fast code fo a = b + c; d = e f; assming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd CPE 442 hazads.27
Softwae Schedling to Avoid Load Hazads Ty podcing fast code fo a = b + c; d = e f; assming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd Fast code: LW LW LW ADD LW SW SUB SW Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf d,rd CPE 442 hazads.28
Slow code: Fast code: LW LW ADD SW LW LW SUB SW Rb,b Rc,c Ra,Rb,Rc a,ra Re,e Rf,f Rd,Re,Rf d,rd LW LW LW ADD LW SW SUB SW Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf d,rd CPE 442 hazads.29
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay What makes pipelining had Smmay (5 mintes) CPE 442 hazads.30
Fom Last Lecte: The Delay Banch Phenomenon Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11 Clk 12: Beq Ifetch Reg/Dec Eec em W (taget is 1000) 16: R-type Ifetch Reg/Dec Eec em W 20: R-type Ifetch Reg/Dec Eec em W 24: R-type Ifetch Reg/Dec Eec em W 1000: Taget of B Ifetch Reg/Dec Eec em W Althogh Beq is fetched ding Cycle 4: Taget addess is NOT witten into the PC ntil the end of Cycle 7 Banch s taget is NOT fetched ntil Cycle 8 3-instction delay befoe the banch take effect CPE 442 hazads.31
Contol Hazad on Banches: 3 stage stall Time (in Clock Cycles) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Pogam Eection Ode (in Instctions) 40 beq $1,$3,36 44 and $12,$2,$5 48 o $13,$6,$2 I Reg I Reg I D Reg Reg D Reg D Reg 52 add $14,$2,$2 I Reg D Reg 80 ld $4,$7,100 I Reg D Reg CPE 442 hazads.32
Banch Stall Impact If CPI = 1, 30% banch, Stall 3 cycles => new CPI = 1.9! 2 pat soltion: Detemine banch taken o not soone, AND Compte taken banch addess ealie Soltion Option 1: ove Zeo test to ID/RF stage Adde to calclate new PC in ID/RF stage 1 clock cycle penalty fo banch vs. 3 CPE 442 hazads.33
Option 1: move HW fowad to edce banch delay Data Path befoe change Instction Fetch IF/ID Inst. Decode Reg. Fetch ID/EX Eecte Add. Calc. EX/E emo y Access E/WB Wite Back 4 ADD Zeo? IR 6..10 PC Instction emoy IR IR 11..15 E/WB.IR Registes Data emoy Sign etend 16 32 CPE 442 hazads.34
Banch Delay now 1 clock cycle Data Path afte change Instction Fetch Inst. Decode Reg. Fetch Eecte Add. Calc. emoy Access Wite Back IF/ID ID/EX EX/E E/WB ADD Zeo? ADD 4 IR 6..10 PC Instction emoy IR IR 11..15 E/WB.IR Registes Data emoy Sign etend 16 32 CPE 442 hazads.35
Option 2: No Stalls, Define Banch as Delayed, inset instction afte the banch and allow it to eecte always, Wost case, SW insets NOP into banch delay if no instction can be fond Whee to get instctions to fill banch delay slot? Befoe banch instction, eample sw 1,0(2); beqd 0,2,T change to, beqd 0,2,T; sw 1,0(2) Fom the taget addess: only valable when banch Fom fall thogh: only valable when don t banch Compile effectiveness fo single banch delay slot: Fills abot 60% of banch delay slots Abot 80% of instctions eected in banch delay slots sefl in comptation abot 50% (60% 80%) of slots seflly filled CPE 442 hazads.36
Complete data Path with Hazad detection and Fowading Fige 6.41 in the tet IF.Flsh Hazad detection nit ID/EX WB EX/E Contol 0 WB E/WB IF/ID EX WB PC 4 Instction memoy Shift left 2 Registes = Data memoy Sign etend Fowading nit CPE 442 hazads.37
Eample Tet Fige 6.52 and $12, $2, $5 beq $1, $3, 7 sb $10, $4, $8 IF.Flsh 72 48 IF/ID 48 44 Contol Hazad detection nit 28 72 0 ID/EX WB EX EX/E WB befoe<1> E/WB WB befoe<2> 4 Instction PC 72 44 memoy Shift left 2 7 Registes = $1 $3 $4 $8 Data memoy Sign etend 10 Fowading nit Clock 3 lw $4, 50($7) bbble (nop) beq $1, $3, 7 sb $10,... befoe<1> IF.Flsh 76 Hazad detection nit ID/EX WB EX/E IF/ID Contol 0 EX WB E/WB WB 76 72 4 PC 76 72 Instction memoy Shift left 2 Registes = $1 $3 Data memoy Sign etend 10 Fowading nit CPE 442 hazads.38 Clock 4
Otline of Today s Lecte Recap and Intodction (5 mintes) Intodction to Hazads (15 mintes) Fowading (25 mintes) 1 cycle Load Delay (5 mintes) 1 cycle Banch Delay (15 mintes) What makes pipelining had Smmay (5 mintes) CPE 442 hazads.39
When is pipelining had? Intepts: 5 instctions eecting in 5 stage pipeline How to stop the pipeline? Restat? Who cased the intept? Stage Poblem intepts occing IF Page falt on instction fetch; misaligned memoy access; memoy-potection violation ID Undefined o illegal opcode EX Aithmetic intept E Page falt on data fetch; misaligned memoy access; memoy-potection violation Load with data page falt, Add with instction page falt? Soltion 1: intept vecto/instction 2: intept ASAP, estat eveything incomplete CPE 442 hazads.40
Data path with Eception Handling, Tet Fige 6.55, add a Case egiste, an Eception PC, and constant add. of Eception Handeling otine IF.Flsh ID.Flsh EX.Flsh 40000040 IF/ID Hazad detection nit Contol 0 ID/EX WB EX Case 0 0 EX/E WB E/WB WB PC 4 Instction memoy Shift left 2 Registes = Ecept PC Data memoy Sign etend Fowading nit CPE 442 hazads.41
Review: Smmay of Pipelining Basics Speed Up Š Pipeline Depth (nmbe of stages); if ideal CPI is 1, then: Speedp = Pipeline depth 1 Pipeline stall cycles pe instction Clock cycle npipelined Clock cycle pipelined Hazads limit pefomance on comptes: stctal: need moe HW esoces data: need fowading, compile schedling contol: ealy evalation & PC, delayed banch, pediction Inceasing length of pipe inceases impact of hazads since pipelining helps instction bandwidth, not latency Compiles key to edcing cost of data and contol hazads load delay slots banch delay slots Eceptions, Instction Set, FP makes pipelining hade Longe pipelines => Banch pediction, moe instction paallelism? CPE 442 hazads.42