CS 152 Computer Architecture and Engineering

Similar documents
EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

EECS150 - Digital Design Lecture 2 - CMOS

EECS150 - Digital Design Lecture 3 - Timing

EECS150 - Digital Design Lecture 3 - Timing

Combinational vs Sequential

Chapter 7 Memory and Programmable Logic

Hardware Design I Chap. 5 Memory elements

64CH SEGMENT DRIVER FOR DOT MATRIX LCD

Nan Ya NT5DS32M8AT-7K 256M DDR SDRAM

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Introduction to CMOS VLSI Design (E158) Lecture 11: Decoders and Delay Estimation

CS 250 VLSI System Design

UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING. Final Examination, December 2017 DURATION: 2 and½ hours

S6B CH SEGMENT DRIVER FOR DOT MATRIX LCD

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

ROM MEMORY AND DECODERS

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

11. Sequential Elements

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

Sequential Circuit Design: Part 1

Memory, Latches, & Registers

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Lecture 10: Sequential Circuits

Digital Integrated Circuits EECS 312

V54C3128(16/80/40)4VB*I 128Mbit SDRAM, INDUSTRIAL TEMPERATURE 3.3 VOLT, TSOP II / FBGA 8M X 16, 16M X 8, 32M X 4

Technology Scaling Issues of an I DDQ Built-In Current Sensor

Difference with latch: output changes on (not after) falling clock edge

L13: Final Project Kickoff. L13: Spring 2005 Introductory Digital Systems Laboratory

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Infineon HYB18T512160AF-3.7 DDR2 SDRAM Circuit Analysis

Noise Margin in Low Power SRAM Cells

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous)

L14: Quiz Information and Final Project Kickoff. L14: Spring 2004 Introductory Digital Systems Laboratory

COMP2611: Computer Organization. Introduction to Digital Logic

V54C3256(16/80/40)4VH 256Mbit SDRAM 3.3 VOLT, TSOP II PACKAGE 16M X 16, 32M X 8, 64M X 4

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

A Low-Power 0.7-V H p Video Decoder

FinFETs & SRAM Design

Sequential Circuit Design: Part 1

BUSES IN COMPUTER ARCHITECTURE

EE5780 Advanced VLSI CAD

Static Timing Analysis for Nanometer Designs

Digital Integrated Circuits EECS 312

VU Mobile Powered by S NO Group

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Music Electronics Finally DeMorgan's Theorem establishes two very important simplifications 3 : Multiplexers

Tutorial Outline. Typical Memory Hierarchy

Flip-Flops A) Synchronization: Clocks and Latches B) Two Stage Latch C) Memory Requires Feedback D) Simple Flip-Flop Gate

Quiz #4 Thursday, April 25, 2002, 5:30-6:45 PM

EECS 270 Final Exam Spring 2012

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

L14: Final Project Kickoff. L14: Spring 2006 Introductory Digital Systems Laboratory

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Chapter 18. DRAM Circuitry Discussion. Block Diagram Description. DRAM Circuitry 113


Design of Organic TFT Pixel Electrode Circuit for Active-Matrix OLED Displays

CS 61C: Great Ideas in Computer Architecture

64CH SEGMENT DRIVER FOR DOT MATRIX LCD INTRODUCTION FEATURES 100 QFP-1420C

Chapter 7 Sequential Circuits

L14: Final Project Kickoff. L14: Spring 2007 Introductory Digital Systems Laboratory

L12: Reconfigurable Logic Architectures

CS/EE 181a 2010/11 Lecture 6

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

PICOSECOND TIMING USING FAST ANALOG SAMPLING

DIGITAL ELECTRONICS MCQs

ELEC 4609 IC DESIGN TERM PROJECT: DYNAMIC PRSG v1.2

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Power Distribution and Clock Design

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Checkpoint 2 Video Interface

L11/12: Reconfigurable Logic Architectures

Chapter. Sequential Circuits

EECS150 - Digital Design Lecture 12 - Video Interfacing. Recap and Outline

Design of Fault Coverage Test Pattern Generator Using LFSR

24. Scaling, Economics, SOI Technology

ECE 2274 Pre-Lab for Experiment Timer Chip

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Lecture 23 Design for Testability (DFT): Full-Scan

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

Lecture 11: Sequential Circuit Design

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Register Transfer Level (RTL) Design Cont.

Overview of All Pixel Circuits for Active Matrix Organic Light Emitting Diode (AMOLED)

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

NT7108. Neotec Semiconductor Ltd. 新德科技股份有限公司 NT7108 LCD Driver. Copyright: NEOTEC (C)

Logic Circuits. A gate is a circuit element that operates on a binary signal.

CPS311 Lecture: Sequential Circuits

PESIT Bangalore South Campus

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

VARIABLE FREQUENCY CLOCKING HARDWARE

AM-OLED pixel circuits suitable for TFT array testing. Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

MBI5152 Application Note

High Performance Carry Chains for FPGAs

Chapter 4: One-Shots, Counters, and Clocks

Laboratory 9 Digital Circuits: Flip Flops, One-Shot, Shift Register, Ripple Counter

Transcription:

CS 152 Computer Architecture and Engineering Lecture 12 Memory and Interfaces 2006-10-10 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

Last Time: Storing a Bit as Q = CV State is coded as the amount of energy stored by a device. +++ +++ --- --- +++ +++ --- --- 1.5V State is read by sensing the amount of energy Problems: noise changes Q (up or down), parasitics leak or source Q. Fortunately, Q cannot change instantaneously, but that only gets us in the ballpark.

Last Time: Storing Bits Reliably Store more energy than we expect from the noise. Q = CV. To store more charge, use a bigger V or make a bigger C. Cost: Power, chip size. Example: 1 bit per capacitor. Write 1.5 volts on C. To read C, measure V. V > 0.75 volts is a 1. V < 0.75 volts is a 0. Cost: Could have stored many bits on that capacitor. Represent state as charge in ways that are robust to noise. Correct small state errors that are introduced by noise. Cost: Complexity. Ex: read C every 1 ms Is V > 0.75 volts? Write back 1.5V (yes) or 0V (no).

Last Time: 1-T DRAM cells Bit Line Word Line Vdd Word Line Vdd Capacitor Bit Line Bit Line n+ n+ p- oxide oxide ------ Word Line and Vdd run on z-axis Why Vcap values start out at ground. Vdd Vcap Diode leakage current.

Today: Memory Technology Wrap-Up Static Memory Circuits: For SRAM memory cells and for flip-flops. Memory Arrays: Row decoders, column sense amps, array sizing. DRAM Interfaces: How the SDRAM chips on the Calinx board work.

Inverters

e l e c t r o n e n e r g y Last Time: Model for off transistor... Vd = 1V I na n+ Vg = 0.2V dielectric p- Vs = V sub = 0V n+ Ids = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] Vg exponential dependence n+ region 1 if Vds > 70mV n+ region Io 100fA, Vo = kt/q = 25mV, κ = 0.7 Vg Vd Ids Vs Current flows when electrons diffuse to the gate wall top # electrons that reach top goes up as wall comes down, implies Ids exp(vg)

Last Time: Transistor Off Current V d Ids = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] I ds V g V s I ds 1.2 ma = I on 0.25 V t I off 10 na 0.7 = V dd CS 152 L6: Performance

Last Time: Model for on transistor... Vd = 2V I µa n+ Vg = 1V +++++++++ ---------- dielectric p- Vs = V sub = 0V n+ Vg Ids = (carriers in channel) / (transit time) Q = CV f(length, velocity) Ids = [(µεw)/(ld)] [Vgs -Vth] [Vds] If Vds > Vgs - Vth, channel physics change : Ids = [(µεw)/(2ld)] [Vgs -Vth]^2 Vd Ids Vs W = transistor width, L = length, D = capacitor plate distance µ is velocity, ε is C dilectric constant

Inverters: Circuits and Layout Vdd symbol Vin Vout Vin Vout

Inverter: Die Cross Section Vout Vin oxide n+ n+ p- Vin oxide p+ p+ n+ n-well Vin Vout

Inverters: n-fet Transistor Equation If Vgs > Vt and Vds > Vgs - Vt : Ids = (k/2) (W/L) [Vgs -Vt]^2 Vin V d Vout V g I ds V s Otherwise, if Vgs > Vt : Ids = k (W/L) [Vgs -Vt] [Vds] Otherwise: Ids 0, but really = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] Note: Vt is transistor threshold, was formerly Vth. Also, Vt is actually Vt(Vs) sqrt(vs).

Inverters: p-fet Transistor Equation V s I sd If Vsg > Vt and Vsd > Vsg - Vt : Isd = (k/2) (W/L) [Vsg -Vt]^2 Vin V V g d Vout Otherwise, if Vsg > Vt : Otherwise: Isd 0, but again, in reality there is a leakage current. Isd = k (W/L) [Vsg -Vt] [Vsd] Note: Vt for p-fet and n-fet are different. Also true for k (fab constant). kp < kn, due to electrons being faster than holes.

Inverters with Vin = Gnd, Vout = Vdd Is Vsd > Vsg - Vt once Vout is Vdd? Is Vsg > Vt? I sd V s Isd = k (W/L) [Vsg -Vt] [Vsd] Vin I ds V d V d V s Vout This goes as close to 0 as it can while still supplying the leakage current. Ids 0, but really a small leakage current

Inverters with Vin = Vdd, Vout = Gnd Isd 0, but really a small leakage current Vin V s I sd V d V d Vout This goes as close to 0 as it can while still supplying the leakage current. I ds V s Is Vds > Vgs - Vt once Vout is Gnd? Is Vgs > Vt? Ids = k (W/L) [Vgs -Vt] [Vds]

Calculating the inverter threshold (Vth) Vth Tie output to input. Vth I sd V s Assume voltage is somewhere near the middle Vin V d V d Vout For nfet, is Vds > Vgs - Vt? For pfet, is Vsd > Vsg - Vt? I ds No, by definition! Use: V s Ids = kn (W/L) [Vth -Vtn] [Vth] Isd = kp (W/L) [Vdd-Vth -Vtp] [Vdd - Vth] To compute the exact voltage in the middle.

Question: What happens when... I sd V s I sd V s Vin V d V d Vin Vout V d V d Vout I ds V s I ds V s Stays at Vth until a tiny amount of Vin noise appears. Then output goes to Vdd or Gnd until...... Vin noise flips it back the other way. Lesson: at Vth, small dvin make big dvout

Static Memory Circuits Dynamic Memory: Circuit remembers for a fraction of a second. Static Memory: Circuit remembers as long as the power is on. Non-volatile Memory: Circuit remembers for many years, even if power is off.

Recall DRAM cell: 1 T + 1 C Word Line Row Column Bit Line Column Row Word Line Vdd Bit Line

Idea: Store each bit with its complement x x Row Why? Gnd Vdd y Vdd Gnd y We can use the redundant representation to compensate for noise and leakage.

Case #1: y = Gnd, y = Vdd... x x Row I sd y y Gnd Vdd I ds

Case #2: y = Vdd, y = Gnd... x x Row I sd y Vdd y Gnd I ds

Combine both cases to complete circuit Gnd Vdd Vth Vth Vdd Gnd Crosscoupled inverters noise noise y y x x

SRAM Challenge #1: It s so big! SRAM area is 6X-10X DRAM area, same generation... Cell has both transistor types Vdd AND Gnd Capacitors are usually parasitic capacitance of wires and transistors. More contacts, more devices, two bit lines...

Challenge #2: Writing is a fight When word line goes high, bitlines fight with cell inverters to flip the bit -- must win quickly! Solution: tune W/L of cell & driver transistors Initial state Vdd Initial state Gnd Bitline drives Gnd Bitline drives Vdd

Challenge #3: Preserving state on read When word line goes high on read, cell inverters must drive large bitline capacitance quickly, to preserve state on its small cell capacitances Cell state Vdd Cell state Gnd Bitline a big capacitor Bitline a big capacitor

SRAM vs DRAM, pros and cons Big win for DRAM DRAM has a 6-10X density advantage at the same technology generation. SRAM advantages SRAM has deterministic latency: its cells do not need to be refreshed. SRAM is much faster: transistors drive bitlines on reads. SRAM easy to design in logic fabrication process (and premium logic processes have SRAM add-ons)

Flip Flops Revisited

Recall: Static RAM cell (6 Transistors) Gnd Vdd Vth Vth Vdd Gnd Crosscoupled inverters noise noise x x!

Recall: Positive edge-triggered flip-flop D Q A flip-flop samples right before the edge, and then holds value. 8#; Sampling circuit 8#;= Holds value 8#; 8#;= 8#;= 8#; 8#;= :#-8;&1-&<&5"#$% 4".2#1.&,4 16 Transistors: Makes an SRAM look compact! What do we get for the 10 extra transistors? Clocked logic semantics. 8#;

Sensing: When clock is low D Q 8#; A flip-flop samples right before the edge, and then holds value. Sampling circuit 8#;= Holds value 8#;= 8#; 8#; 8#;= clk = 0 clk = 1 8#;= 8#; :#-8;&1-&<&5"#$% 4".2#1.&,4 8#;= 8#; 8#;= 8#; 8#; 8#;= Will capture 8#;= new value on posedge. :#-8;&1-&<&5"#$% 4".2#1.&,4 Outputs 8#; last value captured.

Capture: When clock goes high D Q 8#; A flip-flop samples right before the edge, and then holds value. Sampling circuit 8#;= Holds value 8#;= 8#; 8#; 8#;= clk = 1 clk = 0 8#;= :#-8;&1-&<&5"#$% 8#; 4".2#1.&,4 8#;= 8#; 8#;= 8#; 8#; 8#;= Remembers value 8#;= just captured. :#-8;&1-&<&5"#$% 4".2#1.&,4 Outputs value 8#; just captured.

Admin: Final Xilinx Checkoff Friday... Lab report due Monday, 11:59 PM.

Memory Arrays Calinx DRAM: 133 Mhz, 128 Mb SYNCHRONOUS DRAM 128Mb: x4, x8, x16 SDRAM MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds Data sheet on resources page. Will need to understand for final project!

Last Time: 1-T DRAM cell Bit Line Word Line Vdd Word Line Vdd Capacitor Bit Line Bit Line n+ n+ p- oxide oxide ------ Word Line and Vdd run on z-axis Why Vcap values start out at ground. Vdd Vcap Diode leakage current.

Last Time: DRAM Read is Destructive Bit Line (initialized to a low voltage) +++++++ (stored charge from cell) Word Line + 0 -> Vdd Vc -> 0 Vgs Vdd Raising the word line removes the charge from every cell it connects too! Must write back after each read.

Last Time: DRAM Refresh... Bit Line Word Line Parasitic currents leak away charge. Solution: Refresh, by reading cells at regular intervals (tens of milliseconds) + Vdd n+ n+ p- oxide oxide ------ Diode leakage...

Bit Line Column Word Line Row People buy DRAM for the bits. Edge circuits are overhead So, we amortize the edge circuits over big arrays

A bank of 32 Mb (128Mb chip -> 4 banks) 12-bit row address input 1 o f 4 0 9 6 d e c o d e r 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip

Recall DRAM Challenge #3b: Sensing How do we reliably sense a 60mV signal? Compare the word line against the voltage on [...] a dummy world line. sense amp Word line to sense + Dummy word line.? - Cells hold no charge. Dummy word line

Corresponds to row read into sense amps 12-bit row address input 1 o f 4 0 9 6 d e c o d e r Slow! This 7.5ns period DRAM (133 MHz) can do row reads at only 75 ns ( 13 MHz). Plus, need to add selection time. DRAM has high latency to first bit out. A fact of life. 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip

An ill-timed refresh may add to latency Bit Line Word Line Parasitic currents leak away charge. Solution: Refresh, by reading cells at regular intervals (tens of milliseconds) + Vdd n+ n+ p- oxide oxide ------ Diode leakage...

Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 12-bit row address input 1 o f 4 0 9 6 d e c o d e r What if we want all of the 2048 bits? In row access time (75 ns) we can do 10 transfers at 133 MHz. 8-bit chip bus -> 10 x 8 = 80 bits << 2048 Now the row access time looks fast! 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip

Sadly, it s rarely this good... 12-bit row address input 1 o f 4 0 9 6 d e c o d e r What if we want all of the 2048 bits? The we for a CPU would be the program running on the CPU. Recall Amdalh s law: If 20% of the memory accesses need a new row access... not good. 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip

DRAM latency/bandwidth chip features Columns: Design the right interface for CPUs to request the subset of a column of data it wishes: 2048 bits delivered by sense amps Select requested bits, send off the chip Interleaving: Design the right interface to the 4 memory banks on the chip, so several row requests run in parallel. Bank 1 Bank 2 Bank 3 Bank 4

Off-chip interface for the Micron part... A clocked bus protocol (133 MHz) Note! This example is best-case! To access a new row, a slow ACTIVE command must run before the READ. T0 T1 T2 T3 CLK COMMAND READ NOP NOP tlz t OH DQ t AC DOUT DRAM is controlled via commands (READ, WRITE, REFRESH,...) CAS Latency = 2 (CAS = Column Address Strobe) Synchronous data output. From Micron 128 Mb SDRAM data sheet (on resources web page)

Opening a row before reading... CLK T0 T1 T2 T3 T4 T5 T6 T7 T8 tck tcl tch t CKS tckh CKE t CMS t CMH COMMAND ACTIVE NOP NOP 3 NOP 3 READ NOP NOP ACTIVE NOP DQM / DQML, DQMH t CMS t CMH tas tah A0-A9, A11 ROW COLUMN m 2 ROW tas tah ENABLE AUTO PRECHARGE A10 ROW ROW tas tah BA0, BA1 BANK BANK BANK DQ t RCD t RAS t RC 44 ns + CAS Latency tac 6 ns t RP toh DOUT m t HZ 70 ns between row opens

Interleave: Access all 4 banks in parallel T0 T1 T2 T3 T4 T5 T6 CLK COMMAND READ READ READ READ NOP NOP NOP ADDRESS BANK, COL n BANK, COL a BANK, COL x BANK, COL m DQ DOUT n DOUT a DOUT x DOUT m CAS Latency = 3 NOTE: Each READ command may be to any bank. DQM is LOW. Figure 8 Random READ Accesses DON T CARE

Lectures: Coming up next... Essential tools for the final project.