Register Transfer Level (RTL) Design Cont.

Similar documents
Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

CSE 140 Exam #3 Solution Tajana Simunic Rosing

CSE140: Components and Design Techniques for Digital Systems. More D-Flip-Flops. Tajana Simunic Rosing. Sources: TSR, Katz, Boriello & Vahid

Combinational vs Sequential

CSE 140 Exam #3 Tajana Simunic Rosing

Chapter 7 Memory and Programmable Logic

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

CSE140L: Components and Design Techniques for Digital Systems Lab. FSMs. Tajana Simunic Rosing. Source: Vahid, Katz

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

Multiplexor (aka MUX) An example, yet VERY useful circuit!

ECE 263 Digital Systems, Fall 2015

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

More Digital Circuits

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

EECS150 - Digital Design Lecture 9 - CPU Microarchitecture. CMOS Devices

Course Administration

Hardware Design I Chap. 5 Memory elements

COMP2611: Computer Organization. Introduction to Digital Logic

CSE115: Digital Design Lecture 23: Latches & Flip-Flops

CS 261 Fall Mike Lam, Professor. Sequential Circuits

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

TYPICAL QUESTIONS & ANSWERS

problem maximum score 1 28pts 2 10pts 3 10pts 4 15pts 5 14pts 6 12pts 7 11pts total 100pts

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

Computer Systems Architecture

WINTER 14 EXAMINATION

Digital Integrated Circuits EECS 312

Memory, Latches, & Registers

CprE 281: Digital Logic

WINTER 15 EXAMINATION Model Answer

CS/ECE 250: Computer Architecture. Basics of Logic Design: ALU, Storage, Tristate. Benjamin Lee

CHAPTER1: Digital Logic Circuits

L14: Quiz Information and Final Project Kickoff. L14: Spring 2004 Introductory Digital Systems Laboratory

11. Sequential Elements

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

6.3 Sequential Circuits (plus a few Combinational)

CS3350B Computer Architecture Winter 2015

Lecture 6: Simple and Complex Programmable Logic Devices. EE 3610 Digital Systems

Open book/open notes, 90-minutes. Calculators permitted. Do not write on the back side of any pages.

BCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Logic Design ( Part 3) Sequential Logic (Chapter 3)

Sequential Logic Design CS 64: Computer Organization and Design Logic Lecture #14

S.K.P. Engineering College, Tiruvannamalai UNIT I

COMP sequential logic 1 Jan. 25, 2016

Introduction to Microprocessor & Digital Logic

DIGITAL ELECTRONICS MCQs

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

Good Evening! Welcome!

EECS 270 Final Exam Spring 2012

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

ASIC = Application specific integrated circuit

CprE 281: Digital Logic


Introduction to Digital Logic Missouri S&T University CPE 2210 Exam 3 Logistics

Principles of Computer Architecture. Appendix A: Digital Logic

Sequential Circuit Design: Part 1

1. Convert the decimal number to binary, octal, and hexadecimal.

Lecture 10: Sequential Circuits

Sequential Logic. Introduction to Computer Yung-Yu Chuang

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

CS61C : Machine Structures

COE328 Course Outline. Fall 2007

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

CHAPTER 4: Logic Circuits

Logic Design II (17.342) Spring Lecture Outline

Microprocessor Design

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

CSE 352 Laboratory Assignment 3

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified)

Sequential Elements con t Synchronous Digital Systems

CHAPTER 4: Logic Circuits

Good Evening! Welcome!

TEST-3 (DIGITAL ELECTRONICS)-(EECTRONIC)

Chapter 3 Unit Combinational

Chapter. Sequential Circuits

Sequential logic circuits

EECS 270 Midterm Exam Spring 2011

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

CprE 281: Digital Logic

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

Good Evening! Welcome!

Where Are We Now? e.g., ADD $S0 $S1 $S2?? Computed by digital circuit. CSCI 402: Computer Architectures. Some basics of Logic Design (Appendix B)

Computer Architecture and Organization

UC Berkeley CS61C : Machine Structures

Chapter Contents. Appendix A: Digital Logic. Some Definitions

MODULE 3. Combinational & Sequential logic

Lecture 11: Sequential Circuit Design

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam

Sequential logic. Circuits with feedback. How to control feedback? Sequential circuits. Timing methodologies. Basic registers

IT T35 Digital system desigm y - ii /s - iii

Flip-Flops and Sequential Circuit Design


CS 61C: Great Ideas in Computer Architecture

Sequencing and Control

Analogue Versus Digital [5 M]

Transcription:

CSE4: Components and Design Techniques for Digital Systems Register Transfer Level (RTL) Design Cont. Tajana Simunic Rosing

Where we are now What we are covering today: RTL design examples, RTL critical path analysis, CPU design CAPEs are out!!! https://cape.ucsd.edu/students/ Your feedback is very important, please take the time to fill out the survey. I read all your feedback carefully and use it to guide the design of future courses. If at least 25 students do CAPES, I will drop the lowest quiz grade! Deadlines: HW#6 due today the last HW!!! Exam#3 during finals week the last exam!!! 8 minutes long, comprehensive Bring one 8 ½ x paper with handwritten notes, but nothing else Sample midterm 3 has been posted: Problem 4: We did not cover the three guidelines heuristic, all else is ok Problem 5: PLA we did not cover, so just skip it Extra prof. office hour during finals week on Monday :3-2:3pm Extra TA/tutor office hours starting this week on Friday morning through Tuesday at :3am

Data vs. Control Dominated RTL Design Data dominant design: extensive datapath, simple controller Control dominant design: complex controller, simple datapath Example of data dominant design: simple filter Converts digital input stream to new digital output stream e.g: remove noise 8, 8, 8, 8, 24, 8, 8 24 is probably noise, filter might replace by 8 Simple filter: output average of the last N values Small N: less filtering Large N: more filtering, but less sharp output X clk 2 digital filter 2 Y 3

Data Dominated RTL Design Example: FIR Filter FIR filter Finite Impulse Response A configurable weighted sum of past input values y(t) = c*x(t) + c*x(t-) + c2*x(t-2) Above known as 3 tap Tens of taps more common Very general filter User sets the constants (c, c, c2) to define a specific filter RTL design Step : Create HLSM Very simple states/transitions X clk 2 digital filter 2 y(t) = c*x(t) + c*x(t-) + c2*x(t-2) Inputs: X (2 bits) Outputs: Y (2 bits) Local storage: xt, xt, xt2, c, c, c2 (2 bits); Yreg (2 bits) FIR filter Init Yreg := xt := xt := xt2 := c := 3 c := 2 c2 := 2 FC Yreg := c*xt + c*xt + c2*xt2 xt := X xt := xt xt2 := xt Y Assumes constants set to 3, 2, and 2

FIR Filter: Create datapath Begin by creating chain of xt registers to hold past values of X X Y 2 digital filter 2 Instantiate registers for c, c, c2 clk Instantiate multipliers to compute c*x values Instantiate adders y(t) = c*x(t) + c*x(t-) + c2*x(t-2) Add circuitry to allow loading of particular c register Step 3 & 4: Connect to controller, Create FSM: No controller needed CL Ca Ca C X clk e 3 2x4 2 x(t) 3-tap FIR filter x(t-) x(t-2) c c c2 xt xt xt2 * * * + + yreg Y

FIR Filter: Design the Circuit Inputs: X (2 bits) Outputs: Y (2 bits) Local storage: xt, xt, xt2, c, c, c2 (2 bits); Yreg (2 bits) Create datapath Connect Ctrlr/DP Derive FSM Set clr and ld lines appropriately FIR filter Init Yreg := xt := xt := xt2 := c := 3 c := 2 c2 := 2 FC Yreg := c*xt + c*xt + c2*xt2 xt := X xt := xt xt2 := xt 3 2 2 X clk xt_clr xt_ld 2 c_ld c_ld c2_ld xt xt xt2 x(t) c c c2... x(t-) * *... x(t-2) * Datapath for 3-tap FIR filter + + Yreg_clr Yreg_ld Yreg 2 Y 6

Comparing the FIR circuit to a software implementation Circuit Adder has 2-gate delay, multiplier has 2-gate delay Longest past goes through one multiplier and two adders 2 + 2 + 2 = 24-gate delay -tap filter, would have about a 34-gate delay: multiplier and 7 adders on longest path Software -tap filter: multiplications, additions. If 2 instructions per multiplication, 2 per addition. Say -gate delay per instruction. (*2 + *2)* = 4 gate delays CL Ca Ca C X clk y(t) = c*x(t) + c*x(t-) + c2*x(t-2) 3-tap FIR filter e 3 2x4 2 x(t) c x(t-) c x(t-2) c2 xt xt xt2 * * + + * yreg Y

2 ns 7 ns 7 ns 2 ns RTL: Determining Clock Frequency Frequency limited by longest register-to-register delay Known as the critical path There are more components to the critical path: wire delays, setup/hold constraints, etc. Longest path is 7 ns Fastest frequency / 7 ns = 42 MHz a b 2 ns delay + * 5 ns delay Max (2,7,7,5) = 7 ns c d

RTL: A Circuit May Have Numerous Paths Paths can exist s a In the datapath Combinational logic 8 8 In the controller d Between the controller and datapath May be hundreds or thousands of paths Timing analysis tools need to evaluate all possible paths c tot_lt_s clk s s State register tot_ld t ot_clr (c) n n tot_lt_s (b) ld clr Datapath 8-bit < tot 8 8-bit adder (a) 8 a 9

RTL Summary Datapath and Control Design RTL Design Steps. Define the high level state machine 2. Create datapath 3. Connect datapath with control 4. Implement the FSM Timing analysis critical path in more complex circuits Watch out for all possible long paths (e.g. datapath to FSM, FSM control logic, datapath logic etc)

CSE4: Components and Design Techniques for Digital Systems Single Cycle CPU Design Tajana Simunic Rosing

4 RESULT ADDER MIPS Single-Cycle Datapath & Control MUX RESULT ADDER PC << 2 READ ADDRESS INSTRUCTION MEMORY INSTRUCTION [3-] INSTRUCTION[3-26] INSTRUCTION[25-2] INSTRUCTION[2-6] INST[5-] MUX REG_DST READ REGISTER READ REGISTER 2 WRITE REGISTER WRITE MUX DATA REG_WRITE REGISTERS READ DATA READ DATA 2 CON TROL ALU_SRC ALU ALU_OP ZERO RESULT ADDRESS BRANCH MEM_READ,MEM_WRITE DATA MEMORY READ DATA MEM_TO_REG INSTRUCTION[5-] Sign Extend WRITE DATA INSTRUCTION[5-] ALU CONTROL MUX 2

CPU Components Combinational logic: Boolean equations, logic gates Multiplexors and decoders ALU: executes arithmetic /logical operations 3

2-input, 32-bit MUX Selects one input as the output S I 3 I 3 M U X O 3 I I 32 32 M U X 32 O implementation I 3 I 3 M U X O 3 S I I M U X 4 O

Decoder 2 input, 2 2 = 4 outputs I I 2-to-4 DECODER O O O2 O3 I implementation Translates input into binary number B and turns on output B I I I O3 O2 O O O O O2 O3 5

A 32 Full 32-bit ALU OP CODE CarryIn Performs: AND, OR, NOT, ADD, SUB, Overflow Detection, GTE B 32 32-bit ALU 32 Result Overflow CarryOut 6

A3 B3 + MSB ALU Binvert CarryIn ADD GTEin = If GTEout =, A B CarryOut xor OP 2 3 4 result GTEout xor 7 overflow

CPU Components Combinational logic: Boolean equations, logic gates Multiplexors and decoders ALU: executes arithmetic /logical operations Sequential logic: Storage (memory) elements Counters 8

Memory elements: D-Latch Sets SR-latch (Q) to value of D when clock (C) is high; otherwise last Q retained D C Reset Set stores stores C R: reset nor Q Stored state value S: set nor Q D 9

Memory elements: Flip-Flop Stores new value of D in Q when C falls, otherwise current stored value of Q is retained: falling edge-triggered FF C (clock) D (data) C D D LATCH Q Q C2 Q2 D LATCH 2 D2 Q2 Q Q 2

Read/Write Register File Input Read Reg #. MUX selects Q for that set of FFs as output Input Write Reg # and Value. Write Value goes to each FF. Write Reg # turns on C to only FF, where Value is stored. Clock Write Reg # (5 bits) Write Value D E C O D E R O O O3 C FF Q D Reg C FF Q D Reg C FF Q D Reg 3 3 M U X Read Reg # (5 bits) Read Value 2

Comparing Processor Memory Register file Intermediate data storage within CPU Fastest Biggest area/cell SRAM Fast More compact Used for caches DRAM Slowest but very compact And refreshing takes time Different technology due to large caps. Used for main memory 32 4 W_data W_addr R_data R_addr W_en R_en 6 32 register file 32 4 register file MxN Memory implemented as a: SRAM DRAM Size comparison for the same number of bits (not to scale) REGISTER FILE SRAM DRAM OUT OUT2 OUT3 OUT4 R S R S R S R S D Q D Q D Q D Q Data' W Data Data CLK IN IN2 IN3 IN4 W 22

RAM Internal Structure 32 data addr rw en 24x32 RAM Let A = log 2 M d wdata(n-) word enable wdata(n-2) wdata bit storage block (aka cell ) addr addr addr(a-) a a AxM d decoder a(a-) word data cell clk en rw e d(m-) to all cells rdata(n-) rdata(n-2) rdata word word enable enable rw data RAM cell Similar internal structure as register file Decoder enables appropriate word based on address inputs rw controls whether cell is written or read 23

32 data addr rw en 24x32 RAM Static RAM (SRAM) - writing SRAM cell data d cell d data Static RAM cell 6 transistors (recall inverter is 2 transistors) Writing this cell word enable input comes from decoder When, value d loops around inverters That loop is where a bit stays stored When, the data bit value enters the loop data is the bit to be stored in this cell data enters on other side Example shows a being written into cell word enable SRAM cell word enable data d data 24

Static RAM (SRAM) - reading 32 data addr rw en 24x32 RAM SRAM cell Static RAM cell - reading When rw set to read, the RAM logic sets both data and data to The stored bit d will pull either the left line or the right bit down slightly below Sense amplifiers detect which side is slightly pulled down word enable data data d < To sense amplifiers 25

Dynamic RAM (DRAM) 32 data addr rw en 24x32 RAM DRAM cell Dynamic RAM cell transistor (rather than 6) Relies on large capacitor to store bit Write: transistor conducts, data voltage level gets stored on top plate of capacitor Read: look at the value of d Problem: Capacitor discharges over time Must refresh regularly, by reading d and then writing it right back word enable data enable d data d (a) discharges (b) cell capacitor slowly discharging 26

Storage permanence Memory Storage Permanence Traditional ROM/RAM ROM RAM read only, bits stored without power read and write, lose stored bits without power Distinctions blurred Advanced ROMs can be written to e.g., EEPROM, FLASH Advanced RAMs can hold bits without power Life of product Tens of years Battery life ( years) Near zero Mask-programmed ROM Nonvolatile During fabrication only OTP ROM External programmer, one time only EPROM External programmer,,s of cycles EEPROM In-system programmable External programmer OR in-system,,s of cycles FLASH External programmer OR in-system, block-oriented writes,,s of cycles e.g., NVRAM Write ability and storage permanence of memories, showing relative degrees along each axis (not to scale). Ideal memory NVRAM SRAM/DRAM Write ability In-system, fast writes, unlimited cycles 27

ROM & Non-volatile memory Erasable Programmable ROM (EPROM) Uses floating-gate transistor in each cell Programmer uses higher-than-normal voltage so electrons tunnel into the gate Electrons become trapped in the gate Only done for cells that should store, rest are To erase, shine ultraviolet light onto chip Electronically-Erasable Programmable ROM (EEPROM) Programming similar to EPROM Erasing one word at a time electronically Flash memory Large blocks can be erased simultaneously Non-volatile memory (NVM): Phase-change memory (PCM) Material changes phase (liquid to solid) to program STT-RAM & MRAM Uses magnetic properties to program Similar to RAM, but with slower writes PCM Word -line 28 Bit-lin