Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

Size: px
Start display at page:

Download "Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng"

Transcription

1 Slide Set 9 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018

2 ENCM 501 Winter 2018 Slide Set 9 slide 2/42 Contents Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

3 ENCM 501 Winter 2018 Slide Set 9 slide 3/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

4 ENCM 501 Winter 2018 Slide Set 9 slide 4/42 Overview of Tomasulo s algorithm It s interesting that this approach to instruction scheduling was developed around 50 years ago (in 1966!) for very high-end expensive computers, then re-adopted around 20 years ago for consumer-level processors (Intel Pentium Pro, 1995). It s been in continual use for a huge number of out-of-order processor designs for the last two decades. The key ideas are: try to get execution of an instruction started as soon as the source operands are ready often, this will require out-of-order issue; as soon as an instruction result has been produced, try to broadcast that result to all the instructions that want to consume it.

5 ENCM 501 Winter 2018 Slide Set 9 slide 5/42 Goals and non-goals for H&P Sections 3.4 and 3.5 Goals: high instruction throughput dealing effectively with RAW hazards (which, remember, may occur even with in-order processing) dealing effectively with WAW and WAR hazards (which tend to occur with out-of-order processing) Non-goals (which become goals in Section 3.6): high throughput in the face of code with lots of branches correct behaviour in the face of exceptions, e.g., hardware interrupts, TLB misses handled by software, various other exceptions

6 ENCM 501 Winter 2018 Slide Set 9 slide 6/42 Key components instruction fetch and decode unit enhanced register files reservation stations functional units common data bus (CDB) or multiple CDBs for superscalar systems

7 ENCM 501 Winter 2018 Slide Set 9 slide 7/42 Instruction fetch and decode unit This unit merges an L1 I-cache some sort of facility for managing branches and jumps (kind of fuzzy in Sections , but definitely dynamic branch prediction in Section 3.6) instruction decode capability: when an instruction leaves, it s known exactly what kind of instruction it is, and which registers, offsets and immediate operands are involved For textbook Sections , this unit is scalar in any one clock cycle, the maximum output is one instruction. (Section 3.8 looks at output of two or more instructions per cycle.)

8 ENCM 501 Winter 2018 Slide Set 9 slide 8/42 Instruction fetch and decode unit: key property Instructions are issued from the instruction unit in program order. This is critical for correct avoidance of data hazards!

9 ENCM 501 Winter 2018 Slide Set 9 slide 9/42 Enhanced register files In an in-order processor, a register file is precisely what we ve modeled already: If the number of registers is M and the width of a register is N, there must be M N cells to contain the state of the register file; beyond that, there must also be whatever logic is needed to support parallel reads and/or parallel writes. In a Tomasulo-based processor, each one of the M registers requires N bits for register data more bits for register status is the data up-to-date, and if not, which reservation station will supply the data in the future?

10 ENCM 501 Winter 2018 Slide Set 9 slide 10/42 Example: FPR file for MIPS with bit FPRs... Qi = 0 indicates that an FPR is up-to-date; Qi 0 indicates that an FPR is not up-to-date and is waiting for a result for a reservation station. This example works for a system with up to fifteen reservation stations... FPR data Qi F F F

11 ENCM 501 Winter 2018 Slide Set 9 slide 11/42 What do all of the bits on Slide 10 mean? Let s write out the FPR file state in a somewhat more human-friendly format.

12 ENCM 501 Winter 2018 Slide Set 9 slide 12/42 Reservation stations A reservation station receives an instruction from the instruction unit; waits for source operand data to be ready before starting the execution of the instruction; broadcasts the result of the instruction on the CDB, when the result is ready. Example reservation station: An RS capable of processing either ADD.D or SUB.D needs 6 fields: Busy, Op, Vj, Vk, Qj and Qk. (The textbook also shows an A field, but that field is not needed for an RS for ADD.D / SUB.D.) Let s make some notes about how the 6 fields are used.

13 ENCM 501 Winter 2018 Slide Set 9 slide 13/42 Each reservation station has a unique, nonzero identification number. Examples in the textbook have three RS s for ADD.D / SUB.D instructions, and have two RS s for MUL.D / DIV.D instructions. So a possible numbering scheme would be 0001, 0010, 0011 for the first three RS s (called Add1, Add2, and Add3 in the book) and 0100, 0101 for the next two (called Mult1 and Mult2). (Warning: RS stands for reservation station, not for register status. In descriptions of various versions of Tomasulo s algororithm, the textbook uses RegisterStat[x] as an abbreviation for the status of register x.)

14 ENCM 501 Winter 2018 Slide Set 9 slide 14/42 Simple example of instruction issue Suppose a program has been running for a while, and these will be the next two instructions to leave the instruction unit: ADD.D SUB.D F4, F0, F2 F6, F6, F4 Suppose also that registers F0, F2, F4 and F6 are up-to-date, and that RS s Add1 and Add2 are not busy. Let s make notes about how the two instructions will be issued to the RS s.

15 ENCM 501 Winter 2018 Slide Set 9 slide 15/42 Functional units Functional units are the circuits that perform the execution steps for an instruction. Example functional units are FP adders, FP multipliers, integer ALUs, shifters, and so on. To keep things relatively simple, we can imagine a one-to-one correspondence between reservation stations and functional units. For example, each RS for ADD.D / SUB.D could be thought of as guarding the entrance of an FP adder-subtractor circuit, and watching the exit of that circuit for a result. In reality, things will be more complicated multiple RS s will be set up to feed their operands into a single pipelined functional unit.

16 ENCM 501 Winter 2018 Slide Set 9 slide 16/42 Latencies of functional units Tomasulo s algorithm is designed to manage the effects of multiple-cycle latencies of functional units. Further, the algorithm is designed to deal with the fact that the latency of a functional unit might vary from one use of the unit to the next. What are some examples of functional units with variable latencies?

17 ENCM 501 Winter 2018 Slide Set 9 slide 17/42 CDB: Common data bus Most reservation stations are capable of broadcasting results on the CDB. RS s for arithmetic instructions and RS s for loads certainly need to be able to do this, but RS s for store instructions don t. Each reservation station must snoop the CDB watch the CDB for results needed by that RS. The register file must also snoop the CDB to grab results that are needed by registers that are currently not up-to-date. (Remember, such a register has some nonzero Qi value to indicate which RS will produce the result the register is waiting for.)

18 ENCM 501 Winter 2018 Slide Set 9 slide 18/42 Example instruction completion via CDB This sequence got started a few slides back... ADD.D SUB.D F4, F0, F2 F6, F6, F4 Let s make some notes about how these instructions will get completed. For simplicity, let s assume that the given instructions are followed by a lengthy sequence of instructions that don t use F0, F2, F4 or F6.

19 ENCM 501 Winter 2018 Slide Set 9 slide 19/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

20 ENCM 501 Winter 2018 Slide Set 9 slide 20/42 Tomasulo s algorithm and name dependencies The algorithm eliminates WAW and WAR hazards in a really elegant way. To understand how this works, it s sufficient to consider a short example. Here the repeated use of F0 creates both a potential WAW hazard and a potential WAR hazard: DIV.D ADD.D MUL.D SUB.D F0, F20, F2 F22, F22, F0 F0, F24, F24 F26, F26, F0 Let s make notes about how the hazards are eliminated.

21 ENCM 501 Winter 2018 Slide Set 9 slide 21/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

22 ENCM 501 Winter 2018 Slide Set 9 slide 22/42 Tomasulo s algorithm and memory hazards The examples given in textbook Sections 3.5 and 3.6 are excellent regarding hazards involving communication of instruction results through floating-point registers (FPRs). Communication of instruction results through general-purpose registers (GPRs, aka integer registers) is very similar to communication via FPRs, so it s reasonable not to discuss that topic at length.

23 ENCM 501 Winter 2018 Slide Set 9 slide 23/42 However, the textbook, is, uh, less than excellent about the topic of allowing memory accesses (loads and stores) to complete out-of-order when doing so is harmless and helps with instruction throughput; forcing in-order completion of loads and stores when necessary to avoid RAW, WAW and WAR hazards.

24 ENCM 501 Winter 2018 Slide Set 9 slide 24/42 Older and younger instructions It s handy to use the words older and younger as adjectives for instructions in an out-of-order system. Instruction A is older than Instruction B if A came before B in program order. In that case, you could also say that B is younger than A.

25 ENCM 501 Winter 2018 Slide Set 9 slide 25/42 Rules for ordering execution of loads and stores To avoid RAW hazards: It is acceptable to access memory for a load instruction if it is known that there are no incomplete older store instructions that will use the same address as the load. What similar rules would be needed for avoidance of WAW and WAR hazards? ENCM 501 won t go into details of hardware solutions for memory data hazards, but here are a couple of key features: some sort of queue in which program order of loads and stores is remembered after loads and stores leave the instuction fetch/decode unit a capability to quickly compare the address a load or store will use with addresses to be used by older loads or stores

26 ENCM 501 Winter 2018 Slide Set 9 slide 26/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

27 ENCM 501 Winter 2018 Slide Set 9 slide 27/42 Load and store buffers Load buffer and store buffer are names given to reservation stations dedicated to handling loads or stores. Remark: The load buffers and store buffers provide an interface between the execution unit of the processor and the data caches. That s an interesting design problem we don t have time to study in this course.

28 ENCM 501 Winter 2018 Slide Set 9 slide 28/42 Vj, Vk, Qj, Qk, A for store buffers Busy Op Vj Vk Qj Qk A As with the FP math stations, Vj is ready if and only if Qj = 0, and the same applies for Vk and Qk. Vk is used for the FP data to be written in an S.D instruction. So what does it mean if Qk 0? Vj, Qj, and A have to do with memory address calculations. Let s not worry about the details for now.

29 ENCM 501 Winter 2018 Slide Set 9 slide 29/42 Vj, Vk, Qj, Qk, A for load buffers Busy Op Vj Vk Qj Qk A Again, Vj is ready if and only if Qj = 0, and the same applies for Vk and Qk. Vj, Qj, and A have to do with memory address calculations. As with store buffers, let s not worry about the details for now.

30 ENCM 501 Winter 2018 Slide Set 9 slide 30/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

31 ENCM 501 Winter 2018 Slide Set 9 slide 31/42 A loop example This is from page 179 of the textbook: Loop: L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, -8 BNE R1, R2, Loop1 Let s make some notes about the DADDIU and BEQ instructions. Let s assume the loop starts with R1 = 0x and R2 = 0x Let s trace how Tomasulo s algorithm might handle the first two passes through the loop.

32 ENCM 501 Winter 2018 Slide Set 9 slide 32/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

33 ENCM 501 Winter 2018 Slide Set 9 slide 33/42 Costs of the CDB (common data bus) In a typical clock cycle, some reservation station will broadcast a result on the CDB, and other reservation stations and the register file will look at the result to see if it s useful. Transmitting the result and receiving the result both have energy costs. A complex instruction unit, reservation stations, and related hardware require lots of transistors. If Moore s law had not applied for so many decades, we would not see Tomasulo s algorithm used as a basis for design of modestly priced processor chips. It s possible, in some cycles, that two or more reservation stations will simultaneously try to broadcast their results. Why is this not a fatal defect in Tomasulo s algorithm?

34 ENCM 501 Winter 2018 Slide Set 9 slide 34/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

35 ENCM 501 Winter 2018 Slide Set 9 slide 35/42 Tomasulo s algorithm and branch prediction Consider this code fragment: BEQ S.D ADD.D R8, R0, L99 F0, (R10) F2, F2, F4 Suppose the branch is incorrectly predicted as not taken, and S.D and ADD.D get issued while BEQ waits for some earlier instruction to provide a value for R8. If Tomasulo s algorithm does nothing beyond what has been presented so far in lectures, what will prevent S.D from making an incorrect update to memory, and what will prevent ADD.D from making an incorrect update to F2?

36 ENCM 501 Winter 2018 Slide Set 9 slide 36/42 Tomasulo s algorithm and exceptions MUL.D S.D SUB.D L.D ADD.D F2, F4, F6 F2, 0(R8) F0, F12, F14 F2, 0(R9) F8, F8, F2 Suppose MUL.D gets delayed because it has to wait until a result for F6 is ready. That will delay the execution of S.D. Meanwhile, Tomasulo s algorithm may allow completion of SUB.D, L.D, and ADD.D. What kind of problem is created if S.D eventually results in a page fault exception?

37 ENCM 501 Winter 2018 Slide Set 9 slide 37/42 Outline of Slide Set 9 Overview of Tomasulo s algorithm Tomasulo s algorithm and name dependencies Tomasulo s algorithm and memory hazards Load and store buffers A loop example Costs of the CDB (common data bus) Tomasulo s algorithm, branches, and exceptions Out-of-order execution, in-order completion

38 ENCM 501 Winter 2018 Slide Set 9 slide 38/42 Out-of-order execution, in-order completion The version of Tomasulo s algorithm presented in textbook Section 3.5 has scalar issue (that is, at most one instruction issued per clock cycle), out-of-order execution, and out-of-order completion. Section 3.6 modifies the algorithm to include a circuit called a reorder buffer (ROB), which will enforce in-order completion. Use of a reorder buffer solves the branch prediction and exception problems described on slides 35 and 36.

39 ENCM 501 Winter 2018 Slide Set 9 slide 39/42 In a processor with a reorder buffer, issue of an instruction sends information related to the instruction both to a reservation station and to the reorder buffer. A reservation station for a store is responsible for address computation only it is not allowed to write to memory. The reorder buffer is a FIFO queue instructions enter in program order, and leave in program order. When an instruction gets to the head of the ROB, it can be committed as soon as its results are known. Examples: An ADD.D can be committed if a reservation station has provided the sum to the reorder buffer. An S.D can be committed if both the data to be stored and the address to be used are ready.

40 ENCM 501 Winter 2018 Slide Set 9 slide 40/42 Register file changes: The Qi field for each register is replaced by a Busy flag and a Reorder # field. Busy = 0 means the register is up-to-date; Busy = 1 means the register is waiting for a result from whatever entry in the reorder buffer matches the Reorder #. The register file does not watch the CDB for results. The ROB must watch the CDB for results for all of the instructions within the ROB that don t yet have results.

41 ENCM 501 Winter 2018 Slide Set 9 slide 41/42 The reservation stations and functional units work very much as before, except: the Qj and Qk fields hold ROB entry numbers instead of reservation station numbers; each reservation stations has a Dest field to hold an ROB entry number; when a reservation station broadcasts its result on the CDB, it includes the Dest field value to help both the ROB and the other reservation stations.

42 ENCM 501 Winter 2018 Slide Set 9 slide 42/42 The reorder buffer and safe speculation The key point about the ROB is that it can collect a large number of results without knowing whether those results should really be written to registers or memory. Consider a branch instruction that is mispredicted as taken. What happens to all the instructions that got into the ROB before the branch? What happens to the branch target instruction, the successor of the the branch target instruction, etc., which got into the ROB after the branch? The bad effect of the above scenario is a waste of time and energy. What are the important bad effects that were prevented?

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Instruction Level Parallelism and Its. (Part II) ECE 154B

Instruction Level Parallelism and Its. (Part II) ECE 154B Instruction Level Parallelism and Its Exploitation (Part II) ECE 154B Dmitri Strukov ILP techniques not covered last week this week next week Scoreboard Technique Review Allow for out of order execution

More information

Advanced Pipelining and Instruction-Level Paralelism (2)

Advanced Pipelining and Instruction-Level Paralelism (2) Advanced Pipelining and Instruction-Level Paralelism (2) Riferimenti bibliografici Computer architecture, a quantitative approach, Hennessy & Patterson: (Morgan Kaufmann eds.) Tomasulo s Algorithm For

More information

Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91

Tomasulo Algorithm. Developed at IBM and first implemented in IBM s 360/91 Tomasulo Algorithm Developed at IBM and first implemented in IBM s 360/91 IBM wanted to use the existing compiler instead of a specialized compiler for high end machines. Tracks when operands are available

More information

Scoreboard Limitations

Scoreboard Limitations Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue! WAR hazard stall at write Inf3 Computer Architecture - 2016-2017 1 Dynamic Scheduling

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Scoreboard Limitations!

Scoreboard Limitations! Scoreboard Limitations! No forwarding read from register! Structural hazards stall at issue! WAW hazard stall at issue!! WAR hazard stall at write! Inf3 Computer Architecture - 2015-2016 1 Dynamic Scheduling

More information

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO

DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO DYNAMIC INSTRUCTION SCHEDULING WITH TOMASULO Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 12: Dynamic Scheduling: Tomasulo s Algorithm Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS252, UC Berkeley

More information

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach

Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach Lecture 16: Instruction Level Parallelism -- Dynamic Scheduling (OOO) via Tomasulo s Approach CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu

More information

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far. Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview of Chap. 3 (again) Pipelined

More information

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm 2003-10-23 Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L17 Adv.

More information

Out-of-Order Execution

Out-of-Order Execution 1 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulo s algorithm & reservation stations out-of-order completion leads to: imprecise

More information

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi

Dynamic Scheduling. Differences between Tomasulo. Tomasulo Algorithm. CDC 6600 scoreboard. Or ydanicm ceshuldngi Dynamic Scheduling (or out-of-order execution) Dynamic Scheduling Or ydanicm ceshuldngi CDC 6600 scoreboard Instruction storage added to each functional execution unit Instructions issue to FU when no

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Tomasulo Dynamic Scheduling

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Dynamic Scheduling

More information

Differences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components

Differences between Tomasulo. Another Dynamic Algorithm: Tomasulo Organization. Reservation Station Components Another Dynamic Algorithm: Tomasulo Algorithm Differences between Tomasulo Algorithm & Scoreboard For IBM 360/9 about 3 years after CDC 6600 Goal: High Performance without special compilers Differences

More information

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 6 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Pipelining, Hazards Appendix C, HPe Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Pipelining

More information

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7 CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary

More information

CS 152 Midterm 2 May 2, 2002 Bob Brodersen

CS 152 Midterm 2 May 2, 2002 Bob Brodersen CS 152 Midterm 2 May 2, 2002 Bob Brodersen Name Solutions Show your work if you want partial credit! Try all the problems, don t get stuck on one of them. Each one is worth 10 points. 1) 2) 3) 4) 5) 6)

More information

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission

More information

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards.

06 1 MIPS Implementation Pipelined DLX and MIPS Implementations: Hardware, notation, hazards. 06 1 MIPS Implementation 06 1 Material from Chapter 3 of H&P (for DLX). Material from Chapter 6 of P&H (for MIPS). line: (In this set.) Unpipelined DLX Implementation. (Diagram only.) Pipelined DLX and

More information

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20 Advanced Devices Using a combination of gates and flip-flops, we can construct more sophisticated logical devices. These devices, while more complex, are still considered fundamental to basic logic design.

More information

Very Short Answer: (1) (1) Peak performance does or does not track observed performance.

Very Short Answer: (1) (1) Peak performance does or does not track observed performance. Very Short Answer: (1) (1) Peak performance does or does not track observed performance. (2) (1) Which is more effective, dynamic or static branch prediction? (3) (1) Do benchmarks remain valid indefinitely?

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 03 February 09, 2012 Dohn Bowden 1 Today s Lecture Registers and Counters Chapter 12 2 Course Admin 3 Administrative Admin for tonight Syllabus

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

Microprocessor Design

Microprocessor Design Microprocessor Design Principles and Practices With VHDL Enoch O. Hwang Brooks / Cole 2004 To my wife and children Windy, Jonathan and Michelle Contents 1. Designing a Microprocessor... 2 1.1 Overview

More information

Modeling Digital Systems with Verilog

Modeling Digital Systems with Verilog Modeling Digital Systems with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 6-1 Composition of Digital Systems Most digital systems can be partitioned into two types

More information

COMP12111: Fundamentals of Computer Engineering

COMP12111: Fundamentals of Computer Engineering COMP2: Fundamentals of Computer Engineering Part I Course Overview & Introduction to Logic Paul Nutter Introduction What is this course about? Computer hardware design o not electronics nothing nasty like

More information

6.3 Sequential Circuits (plus a few Combinational)

6.3 Sequential Circuits (plus a few Combinational) 6.3 Sequential Circuits (plus a few Combinational) Logic Gates: Fundamental Building Blocks Introduction to Computer Science Robert Sedgewick and Kevin Wayne Copyright 2005 http://www.cs.princeton.edu/introcs

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei Shift Left 2 pc Opcode ExtOp Cont Unit RegDst Addr Addr2 Addr npcsle Reg ALUSrc Mem 2 OVF Branch ALUCtr MemtoReg Mem Funct Extension ALUOp ALU Cont Shift Left 2 ID EXE MEM

More information

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices EECS150 - Digital Design Lecture 9 - CPU Microarchitecture Feb 17, 2009 John Wawrzynek Spring 2009 EECS150 - Lec9-cpu Page 1 CMOS Devices Review: Transistor switch-level models The gate acts like a capacitor.

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv

More information

BUSES IN COMPUTER ARCHITECTURE

BUSES IN COMPUTER ARCHITECTURE BUSES IN COMPUTER ARCHITECTURE The processor, main memory, and I/O devices can be interconnected by means of a common bus whose primary function is to provide a communication path for the transfer of data.

More information

Contents Circuits... 1

Contents Circuits... 1 Contents Circuits... 1 Categories of Circuits... 1 Description of the operations of circuits... 2 Classification of Combinational Logic... 2 1. Adder... 3 2. Decoder:... 3 Memory Address Decoder... 5 Encoder...

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

COMP sequential logic 1 Jan. 25, 2016

COMP sequential logic 1 Jan. 25, 2016 OMP 273 5 - sequential logic 1 Jan. 25, 2016 Sequential ircuits All of the circuits that I have discussed up to now are combinational digital circuits. For these circuits, each output is a logical combination

More information

An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers

An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers An Adaptive Technique for Reducing Leakage and Dynamic Power in Register Files and Reorder Buffers Shadi T. Khasawneh and Kanad Ghose Department of Computer Science State University of New York, Binghamton,

More information

Lecture 0: Organization

Lecture 0: Organization 581365 Tietokoneen rakenne Computer Organization II Spring 2010 Tiina Niklander Matemaattis-luonnontieteellinen tiedekunta Computer Organization II Advanced (master) level course! Prerequisite: Computer

More information

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers Unit 2 Registers and Counters Fundamentals of Logic esign EE2369 Prof. Eric Maconald Fall Semester 23 Registers Groups of flip-flops Can contain data format can be unsigned, 2 s complement and other more

More information

FPGA Development for Radar, Radio-Astronomy and Communications

FPGA Development for Radar, Radio-Astronomy and Communications John-Philip Taylor Room 7.03, Department of Electrical Engineering, Menzies Building, University of Cape Town Cape Town, South Africa 7701 Tel: +27 82 354 6741 email: tyljoh010@myuct.ac.za Internet: http://www.uct.ac.za

More information

Logic Devices for Interfacing, The 8085 MPU Lecture 4

Logic Devices for Interfacing, The 8085 MPU Lecture 4 Logic Devices for Interfacing, The 8085 MPU Lecture 4 1 Logic Devices for Interfacing Tri-State devices Buffer Bidirectional Buffer Decoder Encoder D Flip Flop :Latch and Clocked 2 Tri-state Logic Outputs

More information

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger. CS 110 Computer Architecture Finite State Machines, Functional Units Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University

More information

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee CS/ECE 25: Computer Architecture Basics of Logic esign: ALU, Storage, Tristate Benjamin Lee Slides based on those from Alvin Lebeck, aniel, Andrew Hilton, Amir Roth, Gershon Kedem Homework #3 ue Mar 7,

More information

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus Digital logic: ALUs Sequential logic circuits CS207, Fall 2004 October 11, 13, and 15, 2004 1 Read-only memory (ROM) A form of memory Contents fixed when circuit is created n input lines for 2 n addressable

More information

COE328 Course Outline. Fall 2007

COE328 Course Outline. Fall 2007 COE28 Course Outline Fall 2007 1 Objectives This course covers the basics of digital logic circuits and design. Through the basic understanding of Boolean algebra and number systems it introduces the student

More information

CS 61C: Great Ideas in Computer Architecture

CS 61C: Great Ideas in Computer Architecture CS 6C: Great Ideas in Computer Architecture Combinational and Sequential Logic, Boolean Algebra Instructor: Alan Christopher 7/23/24 Summer 24 -- Lecture #8 Review of Last Lecture OpenMP as simple parallel

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55)

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55) Previous Lecture Sequential Circuits Digital VLSI System Design Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture No 7 Sequential Circuit Design Slide

More information

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access. Chapter 6 Pipelining Improve performance by increasing instrction throghpt Program eection order Time (in instrctions) lw $, ($) Instrction fetch 2 4 6 8 2 4 6 8 ALU Data access lw $2, 2($) 8 ns Instrction

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz CSE140L: Components and Design Techniques for Digital Systems Lab CPU design and PLDs Tajana Simunic Rosing Source: Vahid, Katz 1 Lab #3 due Lab #4 CPU design Today: CPU design - lab overview PLDs Updates

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 1-Bus Architecture and Datapath 10262011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline 1-Bus Microarchitecture and

More information

A VLIW Processor for Multimedia Applications

A VLIW Processor for Multimedia Applications A VLIW Processor for Multimedia Applications E. Holmann T. Yoshida A. Yamada Y. Shimazu Mitsubishi Electric Corporation, System LSI Laboratory 4-1 Mizuhara, Itami, Hyogo 664, Japan Outline Objective System

More information

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review September 1, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

Sequential Elements con t Synchronous Digital Systems

Sequential Elements con t Synchronous Digital Systems ecture 15 Computer Science 61C Spring 2017 February 22th, 2017 Sequential Elements con t Synchronous Digital Systems 1 Administrivia I Good news: Waitlist students: You are in! Concurrent Enrollment students:

More information

Slide Set 7. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Slide Set 7. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary Slide Set 7 for ENEL 353 Fall 216 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Fall Term, 216 SN s ENEL 353 Fall 216 Slide Set 7 slide

More information

ALONG with the progressive device scaling, semiconductor

ALONG with the progressive device scaling, semiconductor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we

More information

A video signal processor for motioncompensated field-rate upconversion in consumer television

A video signal processor for motioncompensated field-rate upconversion in consumer television A video signal processor for motioncompensated field-rate upconversion in consumer television B. De Loore, P. Lippens, P. Eeckhout, H. Huijgen, A. Löning, B. McSweeney, M. Verstraelen, B. Pham, G. de Haan,

More information

ASIC = Application specific integrated circuit

ASIC = Application specific integrated circuit ASIC = Application specific integrated circuit CS 2630 Computer Organization Meeting 19: Building a MIPS processor Brandon Myers University of Iowa The goal: implement most of MIPS So far Implementing

More information

BCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering

BCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering BCN1043 By Dr. Mritha Ramalingam Faculty of Computer Systems & Software Engineering mritha@ump.edu.my http://ocw.ump.edu.my/ authors Dr. Mohd Nizam Mohmad Kahar (mnizam@ump.edu.my) Jamaludin Sallim (jamal@ump.edu.my)

More information

Digital Circuits 4: Sequential Circuits

Digital Circuits 4: Sequential Circuits Digital Circuits 4: Sequential Circuits Created by Dave Astels Last updated on 2018-04-20 07:42:42 PM UTC Guide Contents Guide Contents Overview Sequential Circuits Onward Flip-Flops R-S Flip Flop Level

More information

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview Digilent Nexys-3 Cellular RAM Controller Reference Design Overview General Overview This document describes a reference design of the Cellular RAM (or PSRAM Pseudo Static RAM) controller for the Digilent

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

CHAPTER 4 RESULTS & DISCUSSION

CHAPTER 4 RESULTS & DISCUSSION CHAPTER 4 RESULTS & DISCUSSION 3.2 Introduction This project aims to prove that Modified Baugh-Wooley Two s Complement Signed Multiplier is one of the high speed multipliers. The schematic of the multiplier

More information

Sequential Logic. Introduction to Computer Yung-Yu Chuang

Sequential Logic. Introduction to Computer Yung-Yu Chuang Sequential Logic Introduction to Computer Yung-Yu Chuang with slides by Sedgewick & Wayne (introcs.cs.princeton.edu), Nisan & Schocken (www.nand2tetris.org) and Harris & Harris (DDCA) Review of Combinational

More information

Chapter 3. Boolean Algebra and Digital Logic

Chapter 3. Boolean Algebra and Digital Logic Chapter 3 Boolean Algebra and Digital Logic Chapter 3 Objectives Understand the relationship between Boolean logic and digital computer circuits. Learn how to design simple logic circuits. Understand how

More information

CPS311 Lecture: Sequential Circuits

CPS311 Lecture: Sequential Circuits CPS311 Lecture: Sequential Circuits Last revised August 4, 2015 Objectives: 1. To introduce asynchronous and synchronous flip-flops (latches and pulsetriggered, plus asynchronous preset/clear) 2. To introduce

More information

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits CHAPTER 4: Logic Circuits II. Sequential Circuits Combinational circuits o The outputs depend only on the current input values o It uses only logic gates, decoders, multiplexers, ALUs Sequential circuits

More information

Implementation of Memory Based Multiplication Using Micro wind Software

Implementation of Memory Based Multiplication Using Micro wind Software Implementation of Memory Based Multiplication Using Micro wind Software U.Palani 1, M.Sujith 2,P.Pugazhendiran 3 1 IFET College of Engineering, Department of Information Technology, Villupuram 2,3 IFET

More information

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits Nov 26, 2002 John Wawrzynek Outline SR Latches and other storage elements Synchronizers Figures from Digital Design, John F. Wakerly

More information

UNIT V 8051 Microcontroller based Systems Design

UNIT V 8051 Microcontroller based Systems Design UNIT V 8051 Microcontroller based Systems Design INTERFACING TO ALPHANUMERIC DISPLAYS Many microprocessor-controlled instruments and machines need to display letters of the alphabet and numbers. Light

More information

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics EECS150 - Digital Design Lecture 10 - Interfacing Oct. 1, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications

A Reed Solomon Product-Code (RS-PC) Decoder Chip for DVD Applications IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001 229 A Reed Solomon Product-Code (RS-PC) Decoder Chip DVD Applications Hsie-Chia Chang, C. Bernard Shung, Member, IEEE, and Chen-Yi Lee

More information

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm Overview: In this assignment you will design a register cell. This cell should be a single-bit edge-triggered D-type

More information

Tomasulo Algorithm Based Out of Order Execution Processor

Tomasulo Algorithm Based Out of Order Execution Processor Tomasulo Algorithm Based Out of Order Execution Processor Bhavana P.Shrivastava MAaulana Azad National Institute of Technology, Department of Electronics and Communication ABSTRACT In this research work,

More information

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA

More information

Laboratory Exercise 4

Laboratory Exercise 4 Laboratory Exercise 4 Polling and Interrupts The purpose of this exercise is to learn how to send and receive data to/from I/O devices. There are two methods used to indicate whether or not data can be

More information

AN INTRODUCTION TO DIGITAL COMPUTER LOGIC

AN INTRODUCTION TO DIGITAL COMPUTER LOGIC SUPPLEMENTRY HPTER 1 N INTRODUTION TO DIGITL OMPUTER LOGI J K J K FREE OMPUTER HIPS FREE HOOLTE HIPS I keep telling you Gwendolyth, you ll never attract today s kids that way. S1.0 INTRODUTION 1 2 Many

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14 Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14 Ziad Matni Dept. of Computer Science, UCSB Administrative Only 2.5 weeks left!!!!!!!! OMG!!!!! Th. 5/24 Sequential Logic

More information

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS COURSE / CODE DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS In the same way that logic gates are the building blocks of combinatorial circuits, latches

More information

Performance Driven Reliable Link Design for Network on Chips

Performance Driven Reliable Link Design for Network on Chips Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation

More information

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors CSC258 Week 5 1 We are here Assembly Language Processors Arithmetic Logic Units Devices Finite State Machines Flip-flops Circuits Gates Transistors 2 Circuits using flip-flops Now that we know about flip-flops

More information

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation International Journal of Modern Science and Technology Vol. 2, No. 5, 2017. Page 217-222. http://www.ijmst.co/ ISSN: 2456-0235. Research Article Implementation of Low Power, Delay and Area Efficient Shifters

More information

OUT-OF-ORDER processors with precise exceptions

OUT-OF-ORDER processors with precise exceptions TRANSACTIONS ON COMPUTER, VOL. X, NO. Y, FEBRUARY 2009 1 Store Buffer Design for Multibanked Data Caches Enrique Torres, Member, IEEE, Pablo Ibáñez, Member, IEEE, Víctor Viñals-Yúfera, Member, IEEE, and

More information

UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL

UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL UNIVERSITY OF TORONTO JOÃO MARCUS RAMOS BACALHAU GUSTAVO MAIA FERREIRA HEYANG WANG ECE532 FINAL DESIGN REPORT HOLE IN THE WALL Toronto 2015 Summary 1 Overview... 5 1.1 Motivation... 5 1.2 Goals... 5 1.3

More information

Combinational vs Sequential

Combinational vs Sequential Combinational vs Sequential inputs X Combinational Circuits outputs Z A combinational circuit: At any time, outputs depends only on inputs Changing inputs changes outputs No regard for previous inputs

More information

EET 1131 Lab #12 - Page 1 Revised 8/10/2018

EET 1131 Lab #12 - Page 1 Revised 8/10/2018 Name EET 1131 Lab #12 Shift Registers Equipment and Components Safety glasses ETS-7000 Digital-Analog Training System Integrated Circuits: 74164, 74195 Quartus II software and Altera DE2-115 board Shift

More information

FPGA Implementation of DA Algritm for Fir Filter

FPGA Implementation of DA Algritm for Fir Filter International Journal of Computational Engineering Research Vol, 03 Issue, 8 FPGA Implementation of DA Algritm for Fir Filter 1, Solmanraju Putta, 2, J Kishore, 3, P. Suresh 1, M.Tech student,assoc. Prof.,Professor

More information

WITH the demand of higher video quality, lower bit

WITH the demand of higher video quality, lower bit IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 8, AUGUST 2006 917 A High-Definition H.264/AVC Intra-Frame Codec IP for Digital Video and Still Camera Applications Chun-Wei

More information

Computer Graphics NV1 (1DT383) Computer Graphics (1TT180) Cary Laxer, Ph.D. Visiting Lecturer

Computer Graphics NV1 (1DT383) Computer Graphics (1TT180) Cary Laxer, Ph.D. Visiting Lecturer Computer Graphics NV1 (1DT383) Computer Graphics (1TT180) Cary Laxer, Ph.D. Visiting Lecturer Today s class Introductions Graphics system overview Thursday, October 25, 2007 Computer Graphics - Class 1

More information

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits CHAPTER 4: Logic Circuits II. Sequential Circuits Combinational circuits o The outputs depend only on the current input values o It uses only logic gates, decoders, multiplexers, ALUs Sequential circuits

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev

More information

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005 EE178 Lecture Module 4 Eric Crabill SJSU / Xilinx Fall 2005 Lecture #9 Agenda Considerations for synchronizing signals. Clocks. Resets. Considerations for asynchronous inputs. Methods for crossing clock

More information

Video Output and Graphics Acceleration

Video Output and Graphics Acceleration Video Output and Graphics Acceleration Overview Frame Buffer and Line Drawing Engine Prof. Kris Pister TAs: Vincent Lee, Ian Juch, Albert Magyar Version 1.5 In this project, you will use SDRAM to implement

More information