Register Files and Memories

Similar documents
CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

Lecture 8: Sequential Logic

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Counter dan Register

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

Name Of The Experiment: Sequential circuit design Latch, Flip-flop and Registers

Unit 11. Latches and Flip-Flops

Logic Design. Flip Flops, Registers and Counters

Asynchronous (Ripple) Counters

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Why FPGAs? FPGA Overview. Why FPGAs?

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Sequential Logic Basics

Digital Fundamentals

RS flip-flop using NOR gate

Experiment 8 Introduction to Latches and Flip-Flops and registers

The outputs are formed by a combinational logic function of the inputs to the circuit or the values stored in the flip-flops (or both).

Flip-Flops. Because of this the state of the latch may keep changing in circuits with feedback as long as the clock pulse remains active.

Review of digital electronics. Storage units Sequential circuits Counters Shifters

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

Chapter 4. Logic Design

(CSC-3501) Lecture 7 (07 Feb 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

The University of Texas at Dallas Department of Computer Science CS 4141: Digital Systems Lab

Sequential Logic. E&CE 223 Digital Circuits and Systems (A. Kennings) Page 1

Digital Logic Design ENEE x. Lecture 19

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design ALU and Storage Elements

COMP sequential logic 1 Jan. 25, 2016

Sequential Digital Design. Laboratory Manual. Experiment #3. Flip Flop Storage Elements

RS flip-flop using NOR gate

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

WINTER 15 EXAMINATION Model Answer

Chapter 5 Flip-Flops and Related Devices

CPS311 Lecture: Sequential Circuits

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Sequential Circuits

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

HDL & High Level Synthesize (EEET 2035) Laboratory II Sequential Circuits with VHDL: DFF, Counter, TFF and Timer

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

Last time, we saw how latches can be used as memory in a circuit

COE 202: Digital Logic Design Sequential Circuits Part 1. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

FPGA Design with VHDL

Digital Circuits ECS 371

Vignana Bharathi Institute of Technology UNIT 4 DLD

DALHOUSIE UNIVERSITY Department of Electrical & Computer Engineering Digital Circuits - ECED 220. Experiment 4 - Latches and Flip-Flops

LATCHES & FLIP-FLOP. Chapter 7

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Introduction. Serial In - Serial Out Shift Registers (SISO)

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

FLIP-FLOPS AND RELATED DEVICES

SEQUENTIAL LOGIC. Satish Chandra Assistant Professor Department of Physics P P N College, Kanpur

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) COUNTERS

Sequential Digital Design. Laboratory Manual. Experiment #7. Counters

Combinational vs Sequential

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

CHAPTER 1 LATCHES & FLIP-FLOPS

CSE 352 Laboratory Assignment 3

EKT 121/4 ELEKTRONIK DIGIT 1

BUSES IN COMPUTER ARCHITECTURE

Module -5 Sequential Logic Design

MODULE 3. Combinational & Sequential logic

DIGITAL CIRCUIT LOGIC UNIT 11: SEQUENTIAL CIRCUITS (LATCHES AND FLIP-FLOPS)

IT T35 Digital system desigm y - ii /s - iii

Counters

EEE2135 Digital Logic Design Chapter 6. Latches/Flip-Flops and Registers/Counters 서강대학교 전자공학과


Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Sequential Circuits

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Computer Organization & Architecture Lecture #5

Advanced Digital Logic Design EECS 303

Chapter 4: One-Shots, Counters, and Clocks

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

D Latch (Transparent Latch)

Sri Vidya College of Engineering And Technology. Virudhunagar Department of Electrical and Electronics Engineering

UNIT-3: SEQUENTIAL LOGIC CIRCUITS

XC4000E and XC4000X Series. Field Programmable Gate Arrays. Low-Voltage Versions Available. XC4000E and XC4000X Series. Features

6. Sequential Logic Flip-Flops

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

11. Sequential Elements

Modeling Digital Systems with Verilog

PGT104 Digital Electronics. PGT104 Digital Electronics

Lecture 2: Digi Logic & Bus

Introduction to Sequential Circuits

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Review of Flip-Flop. Divya Aggarwal. Student, Department of Physics and Astro-Physics, University of Delhi, New Delhi. their state.

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Universal Asynchronous Receiver- Transmitter (UART)

PRE J. Figure 25.1a J-K flip-flop with Asynchronous Preset and Clear inputs

INTRODUCTION TO SEQUENTIAL CIRCUITS

Sequential Circuits: Latches & Flip-Flops

Hardware Design I Chap. 5 Memory elements

INC 253 Digital and electronics laboratory I

Lecture 2: Basic FPGA Fabric. James C. Hoe Department of ECE Carnegie Mellon University

Flip-Flops and Related Devices. Wen-Hung Liao, Ph.D. 4/11/2001

Sequential Logic and Clocked Circuits

Transcription:

Register Files and Memories ECE 554 Digital Engineering Laboratory C. R. Kime 2/18/2002 Register Files and Memories Register Files Issues and Objectives Register File Concepts Implementation of Register Files Workarounds For Xilinx FPGAs Bottom Line Memories Timing Issues Width Expansion

Issues and Objectives Issues ECE 554 projects require a broad range of register file and memory configurations ECE 554 lab boards provide very limited structures for implementing register files and memories. Objectives: To develop techniques for implementing a broad range of register file and memory configurations by using with available lab board structures Laboratory 3 Register File Concepts Register file environments Non-Pipelined Pipelined Register File Configurations Address Ports Data Ports Control Ports Timing Latch Flip-flop

Environment - Non-Pipelined RAddr B Rdata B WEn WAddr C Wdata C Rdata A RAddr A ALU Input Wdata C not registered outside of Register File Inputs WEN and Waddr C may or may not be registered Laboratory 5 Environment - Pipelined 1 RAddr B Rdata B WEn WAddr C Wdata C Rdata A RAddr A ALU Register File is part of pipe platform Inputs may or may not be registered

Environment - Pipelined 2 Raddr B Rdata B WEn Waddr C Wdata C Rdata A Raddr A Register File is between pipe platforms is not clocked - WEN controls latches => SRAM Inputs may or may not be registered, but register must be between Rdata A, Rdata B, and Wdata C Laboratory 7 Register File Ports Address Read Write Shared Data Input Output Bidirectional Control Write Enable, Read/Write, Enable, Read, Write,

Register File Configurations Port Counts Number of each of six types of address and data ports Control Port Types Selection of types of control ports from list Port Associations Association of address ports with data ports Association of control ports with data ports Laboratory 9 Latch Flip-flop Timing Latch Pairs Shared Slave Latches Shared Master Latches

Latch-Based Latch/bit of file Latch control can be Write Enable and addresses or some combination of other signals and addresses WEn Waddr Wdata Write Logic Read Logic Rdata Raddr Laboratory 11 Latched-Based Level-sensitive write Setup time on write address relative to leading edge of Wen Hold time on write address relative to trailing edge of Wen Setup and hold time on write data relative to trailing edge of Wen Cannot be part of a pipeline platform in a single clock (flip-flop based) system Latches cannot be in closed loop without: Additional latch on different clock in loop, or Flip-flop in loop

Flip-flop (Latch Pair)-Based Flip-flop/bit of file Flip-flop is clocked by or some combination of and other signal and enabled by addressing logic and combination of other signals WEn Waddr Wdata Write Logic Read Logic Rdata Raddr Laboratory 13 Flip-flop (Latch Pair)-Based Write Logic adds setup-time to that for flipflops Read Logic adds propagation delay to that for for flip-flops Acts like positive pulse master-slave or negative-edge triggered flip-flop register file with above delays added

Flip-flop (Shared-Slave)-Based Latch/bit of file plus latch/bit of output Master latches are clocked by some combination of and other signal and enabled by addressing logic and combination of other signals; slave latches clocks by WEn Waddr Wdata Write Logic Read Logic Rdata Raddr Laboratory 15 Flip-flop (Shared-Master)-Based Latch/bit of file plus latch/bit of input Master latches are clocked by some combination of and other signal and enabled by addressing logic and combination of other signals; slave latches clocks by WEn Waddr Wdata Write Logic Read Logic Rdata Raddr

Implementation of Register Files Custom VLSI SRAM Classic SRAM Xilinx Virtex SRAM Specifications Shortcomings Laboratory 17 Custom VLSI SRAM Is the most flexible of all implementation techniques Can be used to implement any combination of variants discussed Latch-based straightforward; needs additional rank of latches to do flip-flopbased Short of performance issues due to capacitance, can implement any port configuration in a singe storage element array.

Classic SRAM Has single RWaddr port, single Wdata port, and single Rdata port and is latch-based. Due to single address port, can handle only one R or W access per clock cycle Since latch-based, cannot serve as part of a pipe platform - hence Pipelined 2 form Expansion to n R address/data ports Place n SRAMs in parallel with the write accomplished by: Applying same address to all Rwaddr, and Wiring together all Wdata ports Expansion to m W address/data ports Add an m-way multiplexer to address port Use a clock that is m times and multiplex the writes over m clocks Laboratory 19 Classic SRAM (Continued) Addresses must be switched on positive clock edge WEn must be generated from negative clock edge and positive clock edge Expansion to m W address/data ports and n R address/data ports Doing both expansions above Using (m +1)-way multiplexer, and A clock that is (m + 1) times Virtex Distributed SelectRAM The SRAM capability provided in CLBs Can be used with expansion methods here in classic asynchronous SRAM mode or some synchronous modes Getting reliable timing is tricky - may require more complex clocking! See Old Register File writeup on website

Virtex Block SRAM Specifications Symbol - Single Port RAMB4_S# WE EN RST ADDR[#:0] DI[#:0] DO[#:0] Laboratory 21 Virtex Block SRAM Specifications Symbol - Dual Port RAMB4_S#_S# WEA ENA RSTA A ADDRA[#:0] DIA[#:0] WEB ENB RSTB B ADDRB[#:0] DIB[#:0] DOA[#:0] DOB[#:0]

Virtex Block SRAM Specifications Functionality A WRITE operation of data DI to address ADDR occurs for WE = 1, EN = 1, RST = 0 and a positive edge on. DI can also be read on DO after a delay. A READ operation from address ADDR occurs for WE = 0, EN = 1, RST = 0 and a positive edge on. A RESET operation occurs on the DOA latches only for EN = 1, RSTA = 1, and a positive edge on Laboratory 23 Virtex Block SRAM Specifications Functionality, EN, WE, and RST can also be programmed to be active low Conflicts for Dual Port SRAM Simultaneous WRITEs to same location give invalid data A simultaneous READ on the alternate port of a location being written gives invalid READ data A READ on the alternate port of a location being written may not be performed until after a clock-to-clock setup window

Virtex Block SRAM Specifications Functionality - Timing EN, WE, RST, ADDR, DI are captured on the positive edge of in registers (unclear whether latches or flip-flops) WRITEs into the SRAM latch array occur later due to internal timing logic READs (including those associated with writes) occur later due to internal timing logic Laboratory 25 Virtex Block SRAM Shortcomings Using Dual Port Virtex Block SRAM with custom VLSI SRAM used as the standard for comparison On a single clock cycle: Maximum of two independent READ or WRITE operations Maximum of two READbacks of written value from WRITE operation on same port possible READback of written value from WRITE on alternate port not possible

Virtex Block SRAM Shortcomings Additional implication of conditions on prior page: Since the Virtex Block SRAM has two addresses, it should support operands for a binary operation: R[ADDRA] <= R[ADDRA] op R[ADDRB] for arbitrary ADDRA and ADDRB on each clock cycle. But, it does not! Since it is READ-after-WRITE, the right hand side operands are read in clock cycle i and the left hand side result is written in clock cycle i+1. One of the two addresses on the right hand side for cycle i must be the same as the write address on the left hand side for cycle i. This gives an inter-operation address dependency, an architectural disaster! Further, the READ-after-alternate port-write problem causes the transfer R[ADDRy] <= R[ADDRx] op R[ADDRx] to be impossible to execute after a write to ADDRx. Laboratory 27 Virtex Block SRAM Shortcomings Positive edge-triggered storage of inputs to SRAM places an implicit register in from of the SRAM Combinational READs with address changing, for example, on both the leading and trailing edge of clock, impossible Feeding the SRAM array directly from combinational logic impossible Latching of outputs Combinational READs impossible

Why Did Xilinx Produce Such a Design? I can only guess - perhaps you have better ideas. Guess 1: Excessive obsession with potential timing problems In terms of critical timing on signals into SRAM, with the interconnect delay uncertainty in the FPGA, these concerns are realistic Based on their past experience with customers based on Distributed SRAM use, although we made it work with some conservative clocking methods Output latching is to make it look like true long delay FF outputs - ridiculous requirement! Guess 2: The designers didn t understand the range of applications well, e.g., expectations for register files Laboratory 29 Workarounds for Virtex FPGAs Absorbing input registers READ-after-alternate port-write READ port expansion Inter-operation address dependency removal WRITE port expansion Absorbing output latches

Absorbing Input Registers Non-Pipelined - looks like PET flip-flopbased file - no absorbing needed! RAMB4_S#_S# WEA ENA RSTA A ADDRA[#:0] DIA[#:0] WEB ENB RSTB B ADDRB[#:0] DIB[#:0] DOA[#:0] DOB[#:0] ALU Laboratory 31 Absorbing Input Registers Pipelined 1 - Register file part of pipeline platform - looks like PET flip-flop-based file - no absorbing needed! RAMB4_S#_S# P i WEA ENA RSTA A ADDRA[#:0] DIA[#:0] WEB ENB RSTB B ADDRB[#:0] DIB[#:0] DOA[#:0] DOB[#:0] ALU P j

Absorbing Input Registers Pipelined 2 - Register file as SRAM between pipeline platforms - input registers give unwanted platform - must absorb into Pi and Pj platforms Combinational logic between P i and SRAM now placed before P i P i RAMB4_S#_S# WEA ENA P i RSTA A ADDRA[#:0] P i DIA[#:0] P j WEB ENB P RSTB i B ADDRB[#:0] DIB[#:0] P i DOA[#:0] DOB[#:0] P j Laboratory 33 Absorbing Input Registers Summary Non-pipelined - No problem Pipelined 1 - No problem Pipelined 2 - Problem Handle by moving pipeline platform pieces Handle by converting to Pipeline 1 form Affects combinational delay distribution between stages and hence may affect pipeline performance

READ-after-alternate port-write Add bypass logic outside of Virtex Block SRAM: Select P RAMB4_S#_S# WEA ENA RSTA A DOA[#:0] ADDRA[#:0] DIA[#:0] 1 0 Select = P WEB ENB RSTB B ADDRB[#:0] DIB[#:0] DOB[#:0] 0 1 Laboratory 35 Read Port Expansion Expansion to n R address/data ports Place ceiling(n/2) SRAMs in parallel with the two writes accomplished by: Applying same address to all ADDRA and the same address to all ADDRB, and Wiring together all DIA ports and all DIB ports

Read Port Expansion Example for n = 4 ENA ENB ENA1 WADDRA RADDRA1 ENB1 WADDRB RADDRB1 DIA DIB ENA2 RADDRA2 Select for all A mux s is WEA and all B mux s ENB2 is WEB All other like-named signals connected together RADDRB2 RAMB4_S#_S# WEA ENA RSTA A DOA[#:0] ADDRA[#:0] DIA[#:0] WEB ENB RSTB B DOB[#:0] ADDRB[#:0] DIB[#:0] RAMB4_S#_S# WEA ENA RSTA A DOA[#:0] ADDRA[#:0] DIA[#:0] WEB ENB RSTB B DOB[#:0] ADDRB[#:0] DIB[#:0] Laboratory 37 Inter-operation Address Dependency READ-after-WRITE - Can be done for one WRITE - two READs with two parallel Dual Port Block SRAMs with READ-after-alternate port-write logic added to READ side of both. Parallel WRITE on A ports Independent parallel READs on B-ports Each additional parallel Dual Port Block SRAM adds one more READ port Cannot accomplish WRITE-after-READ Cannot be done for more than one active WRITE port without using WRITE Port Expansion

Write Port Expansion Requires super-clocking, in which a clock having a multiple of the frequency of the fundamental operational clock is used to serialize Block SRAM operations. Requires additional registers to locally enter into and return from serialized operations Muxes required that are switched by the a flip-flop driven by the faster clock Laboratory 39 P i -1 Write Port Expansion Example - Non-Pipelined - 4 WRITE Max ports P i1 2 P j RAMB4_S#_S# WEA ENA 2 2 RSTA A DOA[#:0] ADDRA[#:0] 2 DIA[#:0] WEB ENB RSTB 2 B DOB[#:0] 2 ADDRB[#:0] DIB[#:0] P i2

Absorbing Output Latches The output latch is a part of the attempt at a flip-flop appearance for the SRAM operation. As such, there appears to be no way to explicitly work around it Other workarounds handle its effects Laboratory 41 The Bottom Line Overall, it appears that the best approach is to: Use a Non-Pipelined or Pipeline 1 structure Use the Interoperation Dependency solution to achieve multiple dependency-free READs Use WRITE Port Expansion for multiple WRITEs Use the READ-after-alternate port-write to get READafter-WRITE capability Use WRITE Port Expansion with READs on early subcycles to get WRITE-after-READ capability Be cognizant of substantial setup times and delays for the synchronous operations Feel free to experiment with other approaches and apply ideas given to other Virtex Block SRAM uses

Memories Timing Issues Width Expansion Laboratory 43 Timing Issues The off-board SRAMs are asynchronous and have typical signal timing requirements See AS7C4096 Datasheets for timing parameters Address controlled READ is easy WE-controlled WRITE has zero setup and hold times which look easy, but read on Due to unpredictable FPGA timing, timing of memory signals, particularly for WRITE should be verified. In worst case, may need to use super clocking to get reliable timing

Width Expansion Width expansion can be achieved by using super clocking with implementation similar to that for register file write expansion. To expand a 16-bit word to a 16 n bit word requires super clocking at n times the fundamental rate. Laboratory 45 Width Expansion Implementation For address-controlled READs, straightforward Not recommended, although feasible, for WRITEs: Must be trailing edges on, for example, WE, for each of the super clock cycles This will require changes on negative as well as positive super clock edges

Postscript The workarounds do not consider: Multiple clock edge use instead of superclocking Different clock edges on the two ports on a dual port SelectRAM These techniques can potentially be beneficial to the degree that: the resulting constructs are synthesizable, and do not adversely affect performance Laboratory 47