DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011 ADVANCED COMPUTER ARCHITECTURES ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)
Outline 2 Dynamic instruction scheduling: Revision of Scoreboard Tomasulo algorithm Example execution using Tomasulo s algorithm
Dynamic scheduling Scoreboard revision 3 Divide the ID/OF stage in two parts: ISSUE Instruction decoding and verification of structural and WAW hazards Once all structural and WAW conflicts are solved, issue the instruction READ OPERANDS (Dispatch) Wait until all data hazards are solved, to read them from the register file and to dispatch the instruction to execution Scoreboard DISP. IF Stage ISSUE Stage Ready Ready Ready EX/MEM Stage WB Stage IN ORDER OUT-OF-ORDER
4 Scoreboard revision Instruction Status L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Completed in cycle? Issue Disp. EX WB Issue stage: Issue the next instruction if no WAW or structural hazard is found: WAW Hazard if destination register is already going to be written by an instruction Structural hazard if the FU is already busy FU Status DR SA SB MULT1 MULT2 ADD DIV Gen by FU? Data Ready? Busy Op Fi Fj Fk Qj Qk Rj Rk Fill the correct row if no hazard is found Register Results Status FU F0 F2 F4 F6 F8 F10 F12... F30 Assign the FU that will write to the register
5 Scoreboard revision Instruction Status L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Completed in cycle? Issue Disp. EX WB Dispatch stage: Dispatch all instructions that have valid operands to execution FU Status DR SA SB Gen by FU? Data Ready? Busy Op Fi Fj Fk Qj Qk Rj Rk MULT1 YES YES MULT2 ADD NO YES DIV Dispatch and set Rj,Rk to Don t dispatch Register Results Status FU F0 F2 F4 F6 F8 F10 F12... F30
6 Scoreboard revision Instruction Status L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Completed in cycle? Issue Disp. EX WB Execute stage: Wait for the instruction to complete execution and inform the scoreboard on finish FU Status DR SA SB MULT1 MULT2 ADD DIV Gen by FU? Data Ready? Busy Op Fi Fj Fk Qj Qk Rj Rk Register Results Status FU F0 F2 F4 F6 F8 F10 F12... F30
7 Scoreboard revision Instruction Status L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Completed in cycle? Issue Disp. EX WB Write back stage: Write the result to the destination register if no WAR hazard is found WAR hazard if an instruction still requires the value on the register; this happens if a preceding instruction is stuck on the dispatch stage waiting for some other value FU Status DR SA SB MULT1 MULT2 ADD DIV Gen by FU? Data Ready? Busy Op Fi Fj Fk Qj Qk Rj Rk Clear the slot Register Results Status FU F0 F2 F4 F6 F8 F10 F12... F30 Set the register value as valid on write
8 Scoreboard update example (Completed) (Completed) (Executing) (Completed) (At Dispatch) (Ending EX) Instruction Status Completed in cycle? Issue Disp. EX WB L.D F6,34(R2) 1 2 3-4 5 L.D F2,45(R3) 6 7 8-9 10 MUL.D F0,F2,F4 7 11 12-21 22 SUB.D F8,F6,F2 8 11 12-13 14 DIV.D F10,F0,F6 9 23 24-63 64 ADD.D F6,F8,F2 15 16 17-18 24 CYCLE 18: The ADD.D has finished executing, but will stall on cycle 19 because of DIV.D: DIV.D precedes ADD.D DIV.D was stalled at dispatch stage because of a RAW on the value of F0 DIV.D reads both operands at the same time FU Status DR SA SB MULT1 MULT2 ADD DIV Gen by FU? Data Ready? Busy Op Fi Fj Fk Qj Qk Rj Rk Register Results Status FU F0 F2 F4 F6 F8 F10 F12... F30
9 Tomasulo algorithm Proposed by Robert Tomasulo in 1966: Initially proposed to overcome the long latencies in both memory accesses and floating point operations First implemented on the IBM 360/91 The algorithm revealed to be far more powerful than anticipated being used in almost all modern superscalar processors
Tomasulo s algorithm General idea 10 Instead of centralizing the control in a scoreboard, distribute it amongst the different components: Instructions no longer wait on a dispatch stage, instead they are issued directly to reservation stations associated with functional units Once instructions are issued the values are directly copied to the reservation station (works as a form of register renaming) If the instruction operands are not available, store which instruction generates the result (given by the reservation station holding the instruction) Reservation Stations for FU1 FU 1 (e.g., ALU) IF ISSUE Common Data Bus (CDB) FU 2 (e.g., LD/ST) Reservation Stations for FU2 When busy, the reservation stations hold instructions An instruction can be identified by the reservation station where it is being held
Tomasulo s algorithm General idea 11 Instruction issue stalls if all reservation stations for the given operation are busy Functional units (FUs) can be pipelined and may have different number of reservation stations All units write to a CDB which forwards the results to the reservation station and the RF IF ISSUE Register File S1 S2 S3 S4 Address calculation MEMORY L1 L2 L3 L4 I1 I2 I3 I4 FU 2 ( ALU) A1 A2 A3 A4 FU 3 (FP ADD) M1 M2 M3 FU 4 (FP MULT) D1 D2 FU 5 (/FP DIV) Common Data Bus (CDB)
Tomasulo s algorithm Reservation stations 12 Information on reservation stations: Reservation station Q n Station availability Operation to execute Busy Op Vj Value of operands j,k (valid if operands are ready) Readiness of operands j,k (Label of the reservation with the instruction that will generate the result) Vk Qj Qk Load/store operations have an additional field for indexed load/stores, e.g., M[R[AA] + Imm] R[BA] A : used to store the immediate and latter the effective load/store address Additional information stored in the RF: R0 Integer Data Data 0 Readiness Q 0 R1 Data 1 Q 1 Rn Data n... Q n F0 FP Data FP Data 0 Readiness Q 0 F1 FP Data 1 Q 1 Fn FP Data n... Q n Label each register as ready (value of zero) or not ready (indicating the reservation station holding the instruction that generates the value)
Tomasulo s algorithm Issue stages 13 1. Decode the instruction Identify both the operation and the operands 2. Verify if the required functional unit has at least one reservation station available (i.e., which is not busy) If no reservation station is available (structural hazard) stall If there is a reservation station available issue the instruction indicating: a) operation to execute; b) value of all operands that are available, i.e., the value stored in the register file (RF); c) if an operand is not available, indicate the reservation station holding the instruction that will generate the corresponding value Reservation station Q n Station availability Operation to execute Busy Op Vj Value of operands j,k (valid if operands are ready) Readiness of operands j,k (Label of the reservation with the instruction that will generate the result) Vk Qj Qk
14 Tomasulo s algorithm Execute stage 1. If a reservation station has all operands available and there is a functional unit available, start executing the instruction 2. Monitor (snoop) writings to the common data bus (CDB); if a value is written on the CDB and that value is required by an instruction on a reservation station, retrieve it and store it on the corresponding field of the reservation station IF On the example: the functional unit FU5 (floating point division) writes a value to the CDB The reservation stations D1 and A3 hold instructions that require that value; the reservation stations take the result and store it on the corresponding fields... DIV.D F4,F0,F2 DIV.D F6,F4,F2 DADD.D F0,F4,F6... S1 S2 S3 S4 Address calculation MEMORY L1 L2 L3 L4 I1 I2 I3 I4 FU 2 ( ALU) ISSUE A1 A2 A3 A4 FU 3 (FP ADD) Common Data Bus (CDB) M1 M2 M3 FU 4 (FP MULT) Register File D1 D2 FU 5 (/FP DIV) WRITE RESULT FROM INSTRUCTION ON RESERVATION STATION D2
15 Tomasulo s algorithm Writing on the CDB 1. When writing a value on the CDB: Write the value plus The label of the reservation station where the instruction was stored Whenever a reservation station (or register) needs a value, it takes it from the CDB On the example: the functional unit FU5 (floating point division) writes a value to the CDB The reservation stations D1 and A3 hold instructions that require that value; the reservation stations take the result and store it on the corresponding fields... DIV.D F4,F0,F2 DIV.D F6,F4,F2 DADD.D F0,F4,F6... Reservation station A3 Station availability The DADD.D instruction is waiting for values produced by reservation stations D2 and D1; Reservation station D2 holds the first division Reservation station D1 holds the second division Operation to execute Busy DADD.D (invalid data) Value of operands j,k (valid if operands are ready) Readiness of operands j,k (Label of the reservation with the instruction that will generate the result) (invalid data) D2 D1 Wait for value being produced by the instruction on reservation station D1 Wait for value being produced by the instruction on reservation station D3
Tomasulo s algorithm Load/Store unit 16 Address calculation The load store unit is seen as a functional unit with read/write (load/store) buffers to the memory The load/store buffers can be seen as reservation stations S1 S2 S3 S4 Store buffers MEMORY Load buffers L1 L2 L3 L4 Common Data Bus (CDB)
Tomasulo s algorithm Solving hazards 17 RAW hazards: Solved by letting an instruction wait for the corresponding value on a reservation station WAR / WAW hazards Solved by renaming the registers (use of reservation stations)
Tomasulo s algorithm Example 18 Consider the execution of the instructions on the left on a processor with: n-pipelined functional units: 1x Integer ALU, with 1 cycle latency 1x FP multiplier, with 10 cycles latency 1x FP Adder/subtractor, with 2 cycles latency 1x /FP Division, with 40 cycles latency Load/store unit has 2 cycles latency (Add calc+mem access) Reservation stations: 3 load/store buffers 1 slot for integer operations 2 slots for FP multiplication/division 2 slots for FP addition/subtraction L.D F6,34(R2) L.D F2,45(R3) MUL.D F0,F2,F4 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Similar architecture to the CDC6600, except that we are now using Tomasulo s algorithm instead of a Scoreboard
19 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) R1 L.D F2,45(R3) R2 MUL.D F0,F2,F4 R3 SUB.D F8,F6,F2 DIV.D F10,F0,F6 F0 ADD.D F6,F8,F2 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 FP Mult/Div 2 FP Adder 1 FP Adder 2 Register status Q
20 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 R1 L.D F2,45(R3) R2 MUL.D F0,F2,F4 R3 SUB.D F8,F6,F2 DIV.D F10,F0,F6 F0 ADD.D F6,F8,F2 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 Yes L.D R2 0 Ready Ready 34 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 FP Mult/Div 2 FP Adder 1 FP Adder 2 Register status Q LD/ST1
21 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 Calculated effective address R1 L.D F2,45(R3) 2 R2 MUL.D F0,F2,F4 R3 SUB.D F8,F6,F2 DIV.D F10,F0,F6 F0 ADD.D F6,F8,F2 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 Yes L.D R2 0 Ready Ready 34+R2 F10 LD/ST buffer 2 Yes L.D R3 0 Ready Ready 45 LD/ST buffer 3 FP Mult/Div 1 FP Mult/Div 2 FP Adder 1 FP Adder 2 Register status Q LD/ST2 LD/ST1
22 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 Finish loading the value R1 L.D F2,45(R3) 2 Calculated effective address R2 MUL.D F0,F2,F4 3 R3 SUB.D F8,F6,F2 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 Register status LD/ST buffer 1 Yes L.D R2 0 Ready Ready 34+R2 F10 LD/ST buffer 2 Yes L.D R3 0 Ready Ready 45+R3 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D - F4 LD/ST2 Ready Value of F4 is copied, which is FP Mult/Div 2 equivalent to register renaming FP Adder 1 FP Adder 2 F0 F2 F4 Q FP MULT1 LD/ST2 LD/ST1
23 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 Write the result R1 L.D F2,45(R3) 2 4 R2 MUL.D F0,F2,F4 3 R3 SUB.D F8,F6,F2 4 DIV.D F10,F0,F6 ADD.D F6,F8,F2 Reservation stations OpA OpB Res. station Address F6 Register status LD/ST1 Busy Op Vj Vk Qj Qk A F8 FP ADD1 LD/ST buffer 1 F10 LD/ST buffer 2 Yes L.D R3 0 Ready Ready 34+R2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D - F4 LD/ST2 Ready FP Mult/Div 2 FP Adder 1 Yes SUB.D F6 - Ready LD/ST2 Value of F6 is forward from CDB FP Adder 2 F0 F2 F4 Q FP MULT1 LD/ST2
24 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 Write the result R2 MUL.D F0,F2,F4 3 10 cycles left R3 SUB.D F8,F6,F2 4 2 cycles left DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 Yes SUB.D F6 F2 Ready Ready FP Adder 2 Register status Q FP MULT1 LD/ST2 FP ADD1 FP MULT2 Value of F2 is forwarded from CDB; instructions become ready and starts executing
25 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 9 cycles left R3 SUB.D F8,F6,F2 4 1 cycles left DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 Yes SUB.D F6 F2 Ready Ready FP Adder 2 Yes ADD.D - F2 FP A1 Ready Register status Q FP MULT1 FP ADD2 FP ADD1 FP MULT2
26 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 8 cycles left R3 SUB.D F8,F6,F2 4 7 Finished execution DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 Yes SUB.D F6 F2 Ready Ready FP Adder 2 Yes ADD.D - F2 FP A1 Ready Register status Q FP MULT1 FP ADD2 FP ADD1 FP MULT2
27 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 7 cycles left R3 SUB.D F8,F6,F2 4 7 8 Write the result DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 2 cycles left F2 F4 Reservation stations OpA OpB Res. station Address F6 Register status FP MULT1 FP ADD2 Busy Op Vj Vk Qj Qk A F8 FP ADD1 LD/ST buffer 1 F10 FP MULT2 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 Value of F8 is forwarded from CDB; FP Adder 2 Yes ADD.D F8 F2 Ready Ready instruction becomes ready and starts executing Q
28 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 5 cycles left R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 10 Finished execution F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 FP Adder 2 Yes ADD.D F8 F2 Ready Ready Register status Q FP MULT1 FP ADD2 FP MULT2
29 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 4 cycles left R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 10 11 Write the result F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 FP Adder 2 Register status Q FP MULT1 FP ADD2 FP MULT2
30 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 15 Finished execution R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 F0 ADD.D F6,F8,F2 6 10 11 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Yes MUL.D F2 F4 Ready Ready FP Mult/Div 2 Yes DIV.D - F6 FP M1 Ready FP Adder 1 FP Adder 2 Register status Q FP MULT1 FP MULT2
31 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 15 16 Write the result R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 40 cycles left F0 ADD.D F6,F8,F2 6 10 11 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 Register status LD/ST buffer 1 F10 FP MULT2 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 Value of F0 is forwarded from CDB; FP Mult/Div 2 Yes DIV.D F0 F6 Ready Ready instruction becomes ready and starts FP Adder 1 executing FP Adder 2 Q FP MULT1
32 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 15 16 R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 56 Finished execution F0 ADD.D F6,F8,F2 6 10 11 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 FP Mult/Div 2 Yes DIV.D F0 F6 Ready Ready FP Adder 1 FP Adder 2 Register status Q FP MULT2
33 Tomasulo execution example Instruction Status (not required in Tomasulo, used only for illustration) Issue EX WB L.D F6,34(R2) 1 3 4 R1 L.D F2,45(R3) 2 4 5 R2 MUL.D F0,F2,F4 3 15 16 R3 SUB.D F8,F6,F2 4 7 8 DIV.D F10,F0,F6 5 56 57 Write the results F0 ADD.D F6,F8,F2 6 10 11 F2 F4 Reservation stations OpA OpB Res. station Address F6 Busy Op Vj Vk Qj Qk A F8 LD/ST buffer 1 F10 LD/ST buffer 2 LD/ST buffer 3 FP Mult/Div 1 FP Mult/Div 2 FP Adder 1 FP Adder 2 Register status Q FP MULT2
34 Tomasulo execution example Instruction Status (Tomasulo) Issue EX WB L.D F6,34(R2) 1 3 4 IN ORDER: L.D F2,45(R3) 2 4 5 - Issue MUL.D F0,F2,F4 3 15 16 SUB.D F8,F6,F2 4 7 8 OUT OF ORDER: DIV.D F10,F0,F6 5 56 57 - EX ADD.D F6,F8,F2 6 10 11 - WB Instruction Status (Scoreboard) Issue Disp. EX WB L.D F6,34(R2) 1 2 3 4 IN ORDER: L.D F2,45(R3) 5 6 7 8 - Issue MUL.D F0,F2,F4 6 9 19 20 SUB.D F8,F6,F2 7 9 11 12 OUT OF ORDER: DIV.D F10,F0,F6 8 21 61 62 - Disp ADD.D F6,F8,F2 13 14 16 22 - EX - WB ISSUE: Speedup = 13 6 = 2.17 WB: Speedup = 62 57 = 1.09 te: Additional gains are achieved by easing the implementation of other architectural changes
Tomasulo vs Scoreboard 35 Scoreboard Tomasulo Structural hazards Stalls the pipeline Stalls the Pipeline WAW hazards Stalls the pipeline Solved by applying WAR hazards Delay writting the result Renaming (use of reservation stations) Control structure Centralized in the scoreboard Distributed in reservation stations Forwarding Hard to apply Automatically applied through the CDB Simultaneous writings Delayed writting may lead to structural hazards Simultaneous access to the CDB may lead to structural hazards Instruction window Smaller Larger
36 Next lesson Dynamic techniques to extract parallelism More on Tomasulo Dynamic branch prediction