Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91

Size: px
Start display at page:

Download "Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91"

Transcription

1 Tomasulo Algorithm Developed at IBM and first implemented in IBM s 360/91 IBM wanted to use the existing compiler instead of a specialized compiler for high end machines. Tracks when operands are available to minimize RAW, and uses register renaming to minimize WAW hazards Used in Alpha 21264, HP8600 PowerPC G4, MIPS R12000, X86 (with a RISC core) AMD Athlon, PIII, Xeon. In-order issue, but out-of-order execution0 The original IBM design uses pipelined FU s, in he example we will use multiple FU s (same idea, but with RISC machine).. (In Chapter 3.2)

2 Tomasulo Algorithm RAW hazards are avoided by executing the instructions only when its operands are ready. Register renaming are used by renaming all destination registers. DIV.D F0,F2,F4 DIV.D F0,F2,F4 ADD.D F6,F0,F8 S.D F6,0(R1) SUB.D F8,F10,F14 Anti output ADD.D S,F0,F8 S.D S,0(R1) SUB.D T,F10,F14 MUL.D F6,F10,F8 MUL.D F6,F10,T Eliminating name dependency by register renaming (S,T). Note that any subsequent use of F8 should be replaced by T, there may be branches

3 Tomasulo Algorithm. Control & buffers distributed with Functional Units (FUs) Vs. centralized in Scoreboard: FU buffers are called reservation stations which have pending instructions (issued instructions) and operands and other instruction status info (including data dependencies). Register renaming is done by reservation stations fetching and buffering the operands of instructions waiting for issue. Reservations stations are sometimes referred to as physical registers or renaming registers as opposed to architecture registers specified by the ISA. Register renaming eliminates WAR, WAW hazards. If the data is not ready yet, the name of the reservation station that will provide them is kept. (In Chapter 3.2)

4 Tomasulo s Algorithm When successive writes to the same register, only the last write is performed. The information held at the reservation station at the functional unit determine when the instruction can start execution. Instruction results are sent directly from reservation stations to functional units via a common result bus (Common Data Bus CDB. in IBM360/91) thus bypassing intermediate registers. In pipelines with multiple execution units, and issuing multiple instructions per clock, more than one results bus will be needed.

5 Dynamic Scheduling: The Tomasulo Approach Dynamic Scheduling: The Tomasulo Approach Tomasulo s based MIPS including FP unit and load/store unit

6 Tomasulo s Algorithm Each reservation station holds the instructions that have been issued, and the operands if available, otherwise the name of the reservation station that will produce them. The load buffer and store buffers hold data/addresses that are going to the memory or coming from the memory, and behaves exactly like a reservation station. FP registers are connected by a pair of buses to the FU, and a single bus to the store buffers. All results from FU are sent to the CDB which goes everywhere except the load buffers.

7 Steps of Execution Issue: get the next instruction from the instruction queue (FIFO). If there is a matching RS that is empty issue the instruction with the operands if available, else stall (structural hazard). If the operands are not in the registers, keep track of the reservation station that will produce them, renaming eliminating WAR and WAW. Execute If one or more operands are not available, monitor the CDB waiting for it, until available, when sent on the CDB by the reservation station that produced it, copy it. When ready start execution (one instruction per cycle per FU). It is possible that more than one instruction in the same FU are ready in the same cycle!! Load/Store first, calculate the effective address if ready, load go ahead as soon as the memory unit is free, store wait fro the data to be stored. (in program order).

8 Steps of Execution To preserve exception behavior, no instruction can start execution until all the preceding branches are resolved. Before execution, we have to be sure that the branch prediction was correct before we start execution. Although, it is possible to record the exception instead of actually raising it. Write results: when the result is available, write it on the CDB and to registers and RS (including store buffers), also during this step, store write data to memory.

9 Data Structure The data structure used to detect and eliminate hazards are attached to the reservation stations, the register file, and the load/store unit. These units are tagged. These tags are names of an extended set of virtual registers used in renaming. In the example, 4 bits is enough to designate one of the 5 reservation stations or one of the 6 load buffers. (note that in the original 360 there was only 4 FP registers). Once the instruction is issued to the reservation station, it refer to the operand by the number of he RS that produces it. ) means the operand is already available

10 Reservation Station Fields Op Operation to perform in the unit (e.g., + or ) V j, V k Value of Source operands S1 and S2 Store buffers have a single V field indicating result to be stored. Only V i or Q i is valid for each operand. Q j, Q k Reservation stations producing operands. (value to be written). No ready flags as in Scoreboard; Qj,Qk=0 => ready. Store buffers only have Qi for RS producing result. A: Address information for loads or stores. Initially immediate field of instruction then effective address when calculated. Busy: Indicates reservation station is busy. Register result status: Q i Indicates which functional unit will write each register, if one exists. Blank (or 0) when no pending instructions exist that will write to that register.

11 Three Stages of Tomasulo Algorithm 1 Issue: Get instruction from pending Instruction Queue. Instruction issued to a free reservation station(rs) (no structural hazard). Selected RS is marked busy. Control sends available instruction operands values (from ISA registers) to assigned RS. Operands not available yet are renamed to RSs that will produce the operand (register renaming). 2 Execution (EX): Operate on operands. When both operands are ready then start executing on assigned FU. If all operands are not ready, watch Common Data Bus (CDB) for needed result (forwarding done via CDB). 3 Write result (WB): Finish execution. Write result on Common Data Bus (CDB) to all awaiting units (RSs) Mark reservation station as available. Common Data Bus (CDB): data + source ( come from bus): 64 bits for data + 4 bits for Functional Unit source address. Write data to waiting RS if source matches expected RS (that produces result). Does the result forwarding via broadcast to waiting RSs.

12 Toasulo s Algorithm. Issue EX Write L.D F6, 34(R2) L.D F2, 45(R3) MUL. D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2

13 Tomasulo s Algorithm Name Busy OP Vj Vk Qj Qk A LD1 N LD2 Y 45+Reg[3] Add1 Y - Mem[34+R2] LD2 Add2 Y + Add1 LD2 Add3 no MUL1 Y * REG0F4] LD2 MUL2 Y / Mem[34+..] MUL1 Field F0 F2 F4 F6 F8 F10 F12 Q i Mul1 LD2 Add2 Add1 Mul2

14 Steps in The Tomsulo Approach and Steps in The Tomsulo Approach and The Requirements of Each Step (In Chapter 3.2)

15 Drawbacks of The Tomasulo Approach Implementation Complexity: Example: The implementation of the Tomasulo algorithm may have caused delays in the introduction of 360/91, MIPS 10000, IBM 620 among other CPUs. Many high-speed associative result stores using (CDB) are required. Performance limited by Common Data Bus Possible solution: Multiple CDBs more Functional Unit and RS logic needed for parallel associative stores. (In Chapter 3.2)

16 Tomasulo Approach Example Using the same code used in the scoreboard example to be run on the Tomasulo configuration given earlier: # of RSs EX Cycles Integer 1 1 Floating Point Multiply/divide 2 10/40 Floating Point add 3 2 L.D F6, 34(R2) Pipelined Functional Units L.D F2, 45(R3) MUL. D F0, F2, F4 SUB.D F8, F6, F2 Real Data Dependence (RAW) Anti-dependence (WAR) Output Dependence (WAW) DIV.D F10, F0, F6 ADD.D F6, F8, F2 (In Chapter 3.3)

17 Tomasulo Example: Cycle 0 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R2 Load1 No L.D F2 45+ R3 Load2 No MUL.D F0 F2 F4 Load3 No SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No 0Add3 No 0Mult1 No 0Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 0 FU

18 Tomasulo Example Cycle 1 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D L.D MUL.D SUB.D DIV.D ADD.D F6 34+ R2 1 Load1 No 34+R2 Yes F2 45+ R3 Load2 No F0 F2 F4 Load3 No F8 F6 F2 F10 F0 F6 F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0Mult1 No 0Mult2 No Register result status Clock F0 F2 F4 F6 1 FU Load1 F8 F10 F12... F30

19 Tomasulo Example: Cycle 2 Instruction status Execution Write Instruction j k Issue complete Result Busy Address F6 34+ R2 1 2,- Load1 Yes 34+R2 F2 45+ R3 2 Load2 Yes 45+R3 F0 F2 F4 Load3 No F8 F6 F2 F10 F0 F6 F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0Mult1 No 0Mult2 No Register result status L.D L.D MUL.D SUB.D DIV.D ADD.D Clock F0 F2 F4 F6 F8 F10 F12... F30 2 FU Load2 Load1

20 Tomasulo Example: Cycle 3 Instruction status Execution Write Instruction j k Issue complete Result Busy Address F6 34+ R2 1 2,3 Load1 Yes 34+R2 F2 45+ R3 2 Load2 Yes 45+R3 F0 F2 F4 3 Load3 No F8 F6 F2 F10 F0 F6 L.D L.D MUL.D SUB.D DIV.D ADD.D F6 F8 F2 Load processing takes 2 cycles (EX, Mem) Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0Mult1 Yes MULTD R(F4) Load2 0Mult2 No Register result status Clock 3 F0 F2 F4 F6 F8 F10 F12... F30 FU Mult1 Load2 Load1

21 Tomasulo Example: Cycle 4 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R2 1 2,3 4 Load1 No L.D F2 45+ R3 2 3,4 Load2 Yes 45+R3 MUL.D F0 F2 F4 3 Load3 No SUB.D F8 F6 F2 4 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 Yes SUBD M(34+R2) Load2 0Add2 No Add3 No 0Mult1 Yes MULTD R(F4) Load2 0Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 4 FU Mult1 Load2 M(34+R2) Add1 Load2 completing; what is waiting for it?

22 Tomasulo Example: Cycle 5 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R Load1 No L.D F2 45+ R Load2 No MUL.D F0 F2 F4 3 Load3 No SUB.D F8 F6 F2 4 DIV.D F10 F0 F6 5 ADD.D F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 2Add1 Yes SUBD M(34+R2) M(45+R3) 0Add2 No Add3 No 10 Mult1 Yes MULTD M(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2 Load2 result forwarded via CDB to Add1, Mult1 SUB.D, MUL.D execution will start next cycle 6

23 Tomasulo Example Cycle 6 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D L.D MUL.D SUB.D DIV.D ADD.D F6 34+ R Load1 No F2 45+ R Load2 No F0 F2 F4 3 Start 6 Load3 No F8 F6 F2 4 Start 6 F10 F0 F6 5 F6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 1Add1 Yes SUBD M(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No 9 Mult1 Yes MULTD M(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 6 FU Mult1 M(45+R3) Add2 Add1 Mult2

24 Tomasulo Example: Cycle 7 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address F6 34+ R Load1 No F2 45+ R Load2 No L.D L.D MUL.D SUB.D DIV.D ADD.D F0 F2 F4 3 Start 6 Load3 No F8 F6 F2 4 6,7 F10 F0 F6 5 F6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 Yes SUBD M(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No 8 Mult1 Yes MULTD M(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 7 FU Mult1 M(45+R3) Add2 Add1 Mult2 RS Add1 completing; what is waiting for it?

25 Tomasulo Example: Cycle 10 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R Load1 No L.D F2 45+ R Load2 No MUL.D F0 F2 F4 3 Load3 No SUB.D F8 F6 F2 4 6,7 8 DIV.D F10 F0 F6 5 ADD.D F6 F8 F2 6 9,10 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0 Add2 Yes ADDD M() M() M(45+R3) 0Add3 No 5 Mult1 Yes MULTD M(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 10 FU Mult1 M(45+R3) Add2 M() M() Mult2 RS Add2 completing; what is waiting for it?

26 Tomasulo Example: Cycle 11 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R Load1 No L.D F2 45+ R Load2 No MUL.D F0 F2 F4 3 Load3 No SUB.D F8 F6 F DIV.D F10 F0 F6 5 ADD.D F6 F8 F Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No 0Add3 No 4Mult1 Yes MULTDM(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 11 FU Mult1 M(45+R3) (M-M)+M() M() M() Mult2 Write back result of ADD.D in this cycle

27 Tomasulo Example: Cycle 15 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R2 1 2,3 4 Load1 No L.D F2 45+ R3 2 3,4 5 Load2 No MUL.D F0 F2 F4 3 6,15 Load3 No SUB.D F8 F6 F DIV.D F10 F0 F6 5 ADD.D F6 F8 F Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0 Mult1 Yes MULTD M(45+R3) R(F4) 0Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 15 FU Mult1 M(45+R3) (M M)+M() M() M() Mult2 Mult1 completing; what is waiting for it?

28 Tomasulo Example: Cycle 16 FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R Load1 No L.D F2 45+ R Load2 No MUL.D F0 F2 F Load3 No SUB.D F8 F6 F2 4 6,7 8 DIV.D F10 F0 F6 5 ADD.D F6 F8 F2 6 9,10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0Mult1 No 40 Mult2 Yes DIVD M*F4 M(34+R2) Register result status Clock F0 F2 F4 F6 F8 F10 F12... F30 16 FU M*F4 M(45+R3) (M M)+M() M() M() Mult2 Only Divide instruction remains DIV.D execution will start next cycle (17)

29 Tomasulo Example: Cycle 57 (vs 62 cycles for scoreboard) Instruction status Execution Write Instruction j k Issue complete Result Busy Address L.D F6 34+ R Load1 No L.D F2 45+ R Load2 No MUL.D F0 F2 F Load3 No SUB.D F8 F6 F DIV.D F10 F0 F6 5 17,56 57 ADD.D F6 F8 F Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0Add1 No 0Add2 No Add3 No 0Mult1 No 0Mult2 No Register result status Instruction Block done Clock F0 F2 F4 F6 F8 F10 F12... F30 57 FU M*F4 M(45+R3) (M M)+M() M() M() M*F4/M Again we have: In-oder issue, Out-of-order execution, completion

Advanced Pipelining and Instruction-Level Paralelism (2)

Advanced Pipelining and Instruction-Level Paralelism (2) Advanced Pipelining and Instruction-Level Paralelism (2) Riferimenti bibliografici Computer architecture, a quantitative approach, Hennessy & Patterson: (Morgan Kaufmann eds.) Tomasulo s Algorithm For

More information

Instruction Level Parallelism and Its. (Part II) ECE 154B

Instruction Level Parallelism and Its. (Part II) ECE 154B Instruction Level Parallelism and Its Exploitation (Part II) ECE 154B Dmitri Strukov ILP techniques not covered last week this week next week Scoreboard Technique Review Allow for out of order execution

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 12: Dynamic Scheduling: Tomasulo s Algorithm Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS252, UC Berkeley

More information

Scoreboard Limitations!

Scoreboard Limitations! Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue!! WAR hazard stall at write! Inf3 Computer Architecture - 2015-2016 1 Dynamic Scheduling

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu

More information

Scoreboard Limitations

Scoreboard Limitations Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue! WAR hazard stall at write Inf3 Computer Architecture - 2016-2017 1 Dynamic Scheduling

More information

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm 2003-10-23 Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L17 Adv.

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Dynamic Scheduling

More information

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi Dynamic Scheduling (or out-of-order execution) Dynamic Scheduling Or ydanicm ceshuldngi CDC 6600 scoreboard Instruction storage added to each functional execution unit Instructions issue to FU when no

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Tomasulo Dynamic Scheduling

More information

Differences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components

Differences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components Another Dynamic Algorithm: Tomasulo Algorithm Differences between Tomasulo Algorithm & Scoreboard For IBM 360/9 about 3 years after CDC 6600 Goal: High Performance without special compilers Differences

More information

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview of Chap. 3 (again) Pipelined

More information

Out-of-Order Execution

Out-of-Order Execution 1 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulo s algorithm & reservation stations out-of-order completion leads to: imprecise

More information

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 9 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 9 slide

More information

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far. Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Pipelining, Hazards Appendix C, HPe Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Pipelining

More information

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

CS 152 Midterm 2 May 2, 2002 Bob Brodersen CS 152 Midterm 2 May 2, 2002 Bob Brodersen Name Solutions Show your work if you want partial credit! Try all the problems, don t get stuck on one of them. Each one is worth 10 points. 1) 2) 3) 4) 5) 6)

More information

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards. 06 1 MIPS Implementation 06 1 Material from Chapter 3 of H&P (for DLX). Material from Chapter 6 of P&H (for MIPS). line: (In this set.) Unpipelined DLX Implementation. (Diagram only.) Pipelined DLX and

More information

Very Short Answer: (1) (1) Peak performance does or does not track observed performance.

Very Short Answer: (1) (1) Peak performance does or does not track observed performance. Very Short Answer: (1) (1) Peak performance does or does not track observed performance. (2) (1) Which is more effective, dynamic or static branch prediction? (3) (1) Do benchmarks remain valid indefinitely?

More information

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission

More information

An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers

An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers Shadi T. Khasawneh and Kanad Ghose Department of Computer Science State University of New York, Binghamton,

More information

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices EECS150 - Digital Design Lecture 9 - CPU Microarchitecture Feb 17, 2009 John Wawrzynek Spring 2009 EECS150 - Lec9-cpu Page 1 CMOS Devices Review: Transistor switch-level models The gate acts like a capacitor.

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei Shift Left 2 pc Opcode ExtOp Cont Unit RegDst Addr Addr2 Addr npcsle Reg ALUSrc Mem 2 OVF Branch ALUCtr MemtoReg Mem Funct Extension ALUOp ALU Cont Shift Left 2 ID EXE MEM

More information

Modeling Digital Systems with Verilog

Modeling Digital Systems with Verilog Modeling Digital Systems with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 6-1 Composition of Digital Systems Most digital systems can be partitioned into two types

More information

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7 CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary

More information

BUSES IN COMPUTER ARCHITECTURE

BUSES IN COMPUTER ARCHITECTURE BUSES IN COMPUTER ARCHITECTURE The processor, main memory, and I/O devices can be interconnected by means of a common bus whose primary function is to provide a communication path for the transfer of data.

More information

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14 Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14 Ziad Matni Dept. of Computer Science, UCSB Administrative Only 2.5 weeks left!!!!!!!! OMG!!!!! Th. 5/24 Sequential Logic

More information

A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7

A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7 EE457 Lab7 Questions page A few questions to test your familiarity of Lab7 at the end of finishing all assigned parts of Lab 7 1. A. In which parts or subparts of Lab 7 does the STALL signal cause the

More information

Tomasulo Algorithm Based Out of Order Execution Processor

Tomasulo Algorithm Based Out of Order Execution Processor Tomasulo Algorithm Based Out of Order Execution Processor Bhavana P.Shrivastava MAaulana Azad National Institute of Technology, Department of Electronics and Communication ABSTRACT In this research work,

More information

(12) United States Patent (10) Patent No.: US 6,249,855 B1

(12) United States Patent (10) Patent No.: US 6,249,855 B1 USOO6249855B1 (12) United States Patent (10) Patent No.: Farrell et al. (45) Date of Patent: *Jun. 19, 2001 (54) ARBITER SYSTEM FOR CENTRAL OTHER PUBLICATIONS PROCESSING UNIT HAVING DUAL DOMINOED ENCODERS

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv

More information

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C CS6C L5 Intro to SDS, State Elements I () inst.eecs.berkeley.edu/~cs6c CS6C : Machine Structures Lecture #5 Intro to Synchronous Digital Systems, State Elements I 28-7-6 Go BEARS~ Albert Chae, Instructor

More information

Logic Devices for Interfacing, The 8085 MPU Lecture 4

Logic Devices for Interfacing, The 8085 MPU Lecture 4 Logic Devices for Interfacing, The 8085 MPU Lecture 4 1 Logic Devices for Interfacing Tri-State devices Buffer Bidirectional Buffer Decoder Encoder D Flip Flop :Latch and Clocked 2 Tri-state Logic Outputs

More information

A VLIW Processor for Multimedia Applications

A VLIW Processor for Multimedia Applications A VLIW Processor for Multimedia Applications E. Holmann T. Yoshida A. Yamada Y. Shimazu Mitsubishi Electric Corporation, System LSI Laboratory 4-1 Mizuhara, Itami, Hyogo 664, Japan Outline Objective System

More information

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 6 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section

More information

CHAPTER1: Digital Logic Circuits

CHAPTER1: Digital Logic Circuits CS224: Computer Organization S.KHABET CHAPTER1: Digital Logic Circuits 1 Sequential Circuits Introduction Composed of a combinational circuit to which the memory elements are connected to form a feedback

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 1-Bus Architecture and Datapath 10262011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline 1-Bus Microarchitecture and

More information

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14

Review C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o. Lecture #14 CS61C L14 Introduction to Synchronous Digital Systems (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #14 Introduction to Synchronous Digital Systems 2007-7-18 Scott Beamer, Instructor

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #14 Introduction to Synchronous Digital Systems 2007-7-18 Scott Beamer, Instructor CS61C L14 Introduction to Synchronous Digital Systems

More information

AN ABSTRACT OF THE THESIS OF

AN ABSTRACT OF THE THESIS OF AN ABSTRACT OF THE THESIS OF Licheng Zhang for the degree of Master of Science in Electrical and Computer Engineering presented on June 7, 1989. Title: The Design of A Reduced Instruction Set Computer

More information

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits Nov 26, 2002 John Wawrzynek Outline SR Latches and other storage elements Synchronizers Figures from Digital Design, John F. Wakerly

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Part 4.2.1: Learn More Liang Liu liang.liu@eit.lth.se 1 Outline Crossing clock domain Reset, synchronous or asynchronous? 2 Why two DFFs? 3 Crossing clock

More information

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz CSE140L: Components and Design Techniques for Digital Systems Lab CPU design and PLDs Tajana Simunic Rosing Source: Vahid, Katz 1 Lab #3 due Lab #4 CPU design Today: CPU design - lab overview PLDs Updates

More information

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access. Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction

More information

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers Unit 2 Registers and Counters Fundamentals of Logic esign EE2369 Prof. Eric Maconald Fall Semester 23 Registers Groups of flip-flops Can contain data format can be unsigned, 2 s complement and other more

More information

EE178 Spring 2018 Lecture Module 5. Eric Crabill

EE178 Spring 2018 Lecture Module 5. Eric Crabill EE178 Spring 2018 Lecture Module 5 Eric Crabill Goals Considerations for synchronizing signals Clocks Resets Considerations for asynchronous inputs Methods for crossing clock domains Clocks The academic

More information

Data flow architecture for high-speed optical processors

Data flow architecture for high-speed optical processors Data flow architecture for high-speed optical processors Kipp A. Bauchert and Steven A. Serati Boulder Nonlinear Systems, Inc., Boulder CO 80301 1. Abstract For optical processor applications outside of

More information

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics EECS150 - Digital Design Lecture 10 - Interfacing Oct. 1, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

Microprocessor Design

Microprocessor Design Microprocessor Design Principles and Practices With VHDL Enoch O. Hwang Brooks / Cole 2004 To my wife and children Windy, Jonathan and Michelle Contents 1. Designing a Microprocessor... 2 1.1 Overview

More information

EE241 - Spring 2005 Advanced Digital Integrated Circuits

EE241 - Spring 2005 Advanced Digital Integrated Circuits EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 21: Asynchronous Design Synchronization Clock Distribution Self-Timed Pipelined Datapath Req Ack HS Req Ack HS Req Ack HS Req Ack Start

More information

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power

More information

Fill-in the following to understand stalling needs and forwarding opportunities

Fill-in the following to understand stalling needs and forwarding opportunities Fill-in the following to understand stalling needs and forwarding opportunities Instruction ADD4 ADD Receiving forwarding help Providing forwarding help Insists on Doesn t mind Doesn t mind Capable of

More information

Sequencing and Control

Sequencing and Control Sequencing and Control Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Spring, 2016 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/ Source:

More information

SHA-256 Module Specification

SHA-256 Module Specification SHA-256 Module Specification 1 Disclaimer Systemyde International Corporation reserves the right to make changes at any time, without notice, to improve design or performance and provide the best product

More information

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs EGC442 Introdction to Compter Architectre Chapter 4 (Part I) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.ed Introdction CPU performance factors Instrction cont Determined

More information

Contents Circuits... 1

Contents Circuits... 1 Contents Circuits... 1 Categories of Circuits... 1 Description of the operations of circuits... 2 Classification of Combinational Logic... 2 1. Adder... 3 2. Decoder:... 3 Memory Address Decoder... 5 Encoder...

More information

Lecture 0: Organization

Lecture 0: Organization 581365 Tietokoneen rakenne Computer Organization II Spring 2010 Tiina Niklander Matemaattis-luonnontieteellinen tiedekunta Computer Organization II Advanced (master) level course! Prerequisite: Computer

More information

IT T35 Digital system desigm y - ii /s - iii

IT T35 Digital system desigm y - ii /s - iii UNIT - III Sequential Logic I Sequential circuits: latches flip flops analysis of clocked sequential circuits state reduction and assignments Registers and Counters: Registers shift registers ripple counters

More information

MC9211 Computer Organization

MC9211 Computer Organization MC9211 Computer Organization Unit 2 : Combinational and Sequential Circuits Lesson2 : Sequential Circuits (KSB) (MCA) (2009-12/ODD) (2009-10/1 A&B) Coverage Lesson2 Outlines the formal procedures for the

More information

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005 EE178 Lecture Module 4 Eric Crabill SJSU / Xilinx Fall 2005 Lecture #9 Agenda Considerations for synchronizing signals. Clocks. Resets. Considerations for asynchronous inputs. Methods for crossing clock

More information

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee CS/ECE 25: Computer Architecture Basics of Logic esign: ALU, Storage, Tristate Benjamin Lee Slides based on those from Alvin Lebeck, aniel, Andrew Hilton, Amir Roth, Gershon Kedem Homework #3 ue Mar 7,

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 24 State Circuits : Circuits that Remember Senior Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Bio NAND gate Researchers at Imperial

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

MISO - EPG DATA QUALITY INVESTIGATION

MISO - EPG DATA QUALITY INVESTIGATION MISO - EPG DATA QUALITY INVESTIGATION Ken Martin Electric Power Group Kevin Frankeny, David Kapostasy, Anna Zwergel MISO Outline Case 1 noisy frequency signal Resolution limitations Case 2 noisy frequency

More information

UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL

UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL Toronto 2015 Summary 1 Overview... 5 1.1 Motivation... 5 1.2 Goals... 5 1.3

More information

Multicore Design Considerations

Multicore Design Considerations Multicore Design Considerations Multicore: The Forefront of Computing Technology We re not going to have faster processors. Instead, making software run faster in the future will mean using parallel programming

More information

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review September 1, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

4.5 Pipelining. Pipelining is Natural!

4.5 Pipelining. Pipelining is Natural! 4.5 Pipelining Ovelapped execution of instuctions Instuction level paallelism (concuency) Example pipeline: assembly line ( T Fod) Response time fo any instuction is the same Instuction thoughput inceases

More information

Chapter 4. Logic Design

Chapter 4. Logic Design Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

Lab2: Cache Memories. Dimitar Nikolov

Lab2: Cache Memories. Dimitar Nikolov Lab2: Cache Memories Dimitar Nikolov Goal Understand how cache memories work Learn how different cache-mappings impact CPU time Leran how different cache-sizes impact CPU time Lund University / Electrical

More information

Logic Design. Flip Flops, Registers and Counters

Logic Design. Flip Flops, Registers and Counters Logic Design Flip Flops, Registers and Counters Introduction Combinational circuits: value of each output depends only on the values of inputs Sequential Circuits: values of outputs depend on inputs and

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember 2008-3-14 Scott Beamer, Guest Lecturer www.piday.org 3.14159265358979323 8462643383279502884

More information

Sequential Logic. Introduction to Computer Yung-Yu Chuang

Sequential Logic. Introduction to Computer Yung-Yu Chuang Sequential Logic Introduction to Computer Yung-Yu Chuang with slides by Sedgewick & Wayne (introcs.cs.princeton.edu), Nisan & Schocken (www.nand2tetris.org) and Harris & Harris (DDCA) Review of Combinational

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 21 State Elements : Circuits that Remember 2007-03-07 Mocha sipping TA Valerie Ishida inst.eecs.berkeley.edu/~cs61c-td 161 Exabytes

More information

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction 1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu

More information

An automatic synchronous to asynchronous circuit convertor

An automatic synchronous to asynchronous circuit convertor An automatic synchronous to asynchronous circuit convertor Charles Brej Abstract The implementation methods of asynchronous circuits take time to learn, they take longer to design and verifying is very

More information

Block Diagram. dw*3 pixin (RGB) pixin_vsync pixin_hsync pixin_val pixin_rdy. clk_a. clk_b. h_s, h_bp, h_fp, h_disp, h_line

Block Diagram. dw*3 pixin (RGB) pixin_vsync pixin_hsync pixin_val pixin_rdy. clk_a. clk_b. h_s, h_bp, h_fp, h_disp, h_line Key Design Features Block Diagram Synthesizable, technology independent IP Core for FPGA, ASIC and SoC reset underflow Supplied as human readable VHDL (or Verilog) source code Simple FIFO input interface

More information

CHAPTER 4 RESULTS & DISCUSSION

CHAPTER 4 RESULTS & DISCUSSION CHAPTER 4 RESULTS & DISCUSSION 3.2 Introduction This project aims to prove that Modified Baugh-Wooley Two s Complement Signed Multiplier is one of the high speed multipliers. The schematic of the multiplier

More information

ELEN Electronique numérique

ELEN Electronique numérique ELEN0040 - Electronique numérique Patricia ROUSSEAUX Année académique 2014-2015 CHAPITRE 6 Registers and Counters ELEN0040 6-277 Design of a modulo-8 binary counter using JK Flip-flops 3 bits are required

More information

FPGA Design. Part I - Hardware Components. Thomas Lenzi

FPGA Design. Part I - Hardware Components. Thomas Lenzi FPGA Design Part I - Hardware Components Thomas Lenzi Approach We believe that having knowledge of the hardware components that compose an FPGA allow for better firmware design. Being able to visualise

More information

Performance Driven Reliable Link Design for Network on Chips

Performance Driven Reliable Link Design for Network on Chips Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation

More information

Spiral Content Mapping. Spiral 2 1. Learning Outcomes DATAPATH COMPONENTS. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Spiral Content Mapping. Spiral 2 1. Learning Outcomes DATAPATH COMPONENTS. Datapath Components: Counters Adders Design Example: Crosswalk Controller -. -. piral Content Mapping piral Theory Combinational Design equential Design ystem Level Design Implementation and Tools Project piral Performance metrics (latency vs. throughput) Boolean Algebra Canonical

More information

MPEG decoder Case. K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf. Philips Research Eindhoven, The Netherlands

MPEG decoder Case. K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf. Philips Research Eindhoven, The Netherlands MPEG decoder Case K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf Philips Research Eindhoven, The Netherlands 1 Outline Introduction Consumer Electronics Kahn Process Networks Revisited

More information

FPGA Prototyping using Behavioral Synthesis for Improving Video Processing Algorithm and FHD TV SoC Design Masaru Takahashi

FPGA Prototyping using Behavioral Synthesis for Improving Video Processing Algorithm and FHD TV SoC Design Masaru Takahashi FPGA Prototyping using Behavioral Synthesis for Improving Video Processing Algorithm and FHD TV SoC Design Masaru Takahashi SoC Software Platform Division, Renesas Electronics Corporation January 28, 2011

More information

Power Reduction Techniques for a Spread Spectrum Based Correlator

Power Reduction Techniques for a Spread Spectrum Based Correlator Power Reduction Techniques for a Spread Spectrum Based Correlator David Garrett (garrett@virginia.edu) and Mircea Stan (mircea@virginia.edu) Center for Semicustom Integrated Systems University of Virginia

More information

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS Key Design Features Block Diagram Synthesizable, technology independent IP Core for FPGA, ASIC or SoC Supplied as human readable VHDL (or Verilog) source code Output supports full flow control permitting

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev

More information

CSE 352 Laboratory Assignment 3

CSE 352 Laboratory Assignment 3 CSE 352 Laboratory Assignment 3 Introduction to Registers The objective of this lab is to introduce you to edge-trigged D-type flip-flops as well as linear feedback shift registers. Chapter 3 of the Harris&Harris

More information

Dual Link DVI Receiver Implementation

Dual Link DVI Receiver Implementation Dual Link DVI Receiver Implementation This application note describes some features of single link receivers that must be considered when using 2 devices for a dual link application. Specific characteristics

More information

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR

More information

HIGH SPEED ASYNCHRONOUS DATA MULTIPLEXER/ DEMULTIPLEXER FOR HIGH DENSITY DIGITAL RECORDERS

HIGH SPEED ASYNCHRONOUS DATA MULTIPLEXER/ DEMULTIPLEXER FOR HIGH DENSITY DIGITAL RECORDERS HIGH SPEED ASYNCHRONOUS DATA MULTIPLEXER/ DEMULTIPLEXER FOR HIGH DENSITY DIGITAL RECORDERS Mr. Albert Berdugo Mr. Martin Small Aydin Vector Division Calculex, Inc. 47 Friends Lane P.O. Box 339 Newtown,

More information

WWW.STUDENTSFOCUS.COM + Class Subject Code Subject Prepared By Lesson Plan for Time: Lesson. No 1.CONTENT LIST: Introduction to Unit III 2. SKILLS ADDRESSED: Listening I year, 02 sem CS6201 Digital Principles

More information

OUT-OF-ORDER processors with precise exceptions

OUT-OF-ORDER processors with precise exceptions TRANSACTIONS ON COMPUTER, VOL. X, NO. Y, FEBRUARY 2009 1 Store Buffer Design for Multibanked Data Caches Enrique Torres, Member, IEEE, Pablo Ibáñez, Member, IEEE, Víctor Viñals-Yúfera, Member, IEEE, and

More information