CS 152 Computer Architecture and Engineering Lecture 12 Memory and Interfaces 2006-10-10 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/
Last Time: Storing a Bit as Q = CV State is coded as the amount of energy stored by a device. +++ +++ --- --- +++ +++ --- --- 1.5V State is read by sensing the amount of energy Problems: noise changes Q (up or down), parasitics leak or source Q. Fortunately, Q cannot change instantaneously, but that only gets us in the ballpark.
Last Time: Storing Bits Reliably Store more energy than we expect from the noise. Q = CV. To store more charge, use a bigger V or make a bigger C. Cost: Power, chip size. Example: 1 bit per capacitor. Write 1.5 volts on C. To read C, measure V. V > 0.75 volts is a 1. V < 0.75 volts is a 0. Cost: Could have stored many bits on that capacitor. Represent state as charge in ways that are robust to noise. Correct small state errors that are introduced by noise. Cost: Complexity. Ex: read C every 1 ms Is V > 0.75 volts? Write back 1.5V (yes) or 0V (no).
Last Time: 1-T DRAM cells Bit Line Word Line Vdd Word Line Vdd Capacitor Bit Line Bit Line n+ n+ p- oxide oxide ------ Word Line and Vdd run on z-axis Why Vcap values start out at ground. Vdd Vcap Diode leakage current.
Today: Memory Technology Wrap-Up Static Memory Circuits: For SRAM memory cells and for flip-flops. Memory Arrays: Row decoders, column sense amps, array sizing. DRAM Interfaces: How the SDRAM chips on the Calinx board work.
Inverters
e l e c t r o n e n e r g y Last Time: Model for off transistor... Vd = 1V I na n+ Vg = 0.2V dielectric p- Vs = V sub = 0V n+ Ids = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] Vg exponential dependence n+ region 1 if Vds > 70mV n+ region Io 100fA, Vo = kt/q = 25mV, κ = 0.7 Vg Vd Ids Vs Current flows when electrons diffuse to the gate wall top # electrons that reach top goes up as wall comes down, implies Ids exp(vg)
Last Time: Transistor Off Current V d Ids = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] I ds V g V s I ds 1.2 ma = I on 0.25 V t I off 10 na 0.7 = V dd CS 152 L6: Performance
Last Time: Model for on transistor... Vd = 2V I µa n+ Vg = 1V +++++++++ ---------- dielectric p- Vs = V sub = 0V n+ Vg Ids = (carriers in channel) / (transit time) Q = CV f(length, velocity) Ids = [(µεw)/(ld)] [Vgs -Vth] [Vds] If Vds > Vgs - Vth, channel physics change : Ids = [(µεw)/(2ld)] [Vgs -Vth]^2 Vd Ids Vs W = transistor width, L = length, D = capacitor plate distance µ is velocity, ε is C dilectric constant
Inverters: Circuits and Layout Vdd symbol Vin Vout Vin Vout
Inverter: Die Cross Section Vout Vin oxide n+ n+ p- Vin oxide p+ p+ n+ n-well Vin Vout
Inverters: n-fet Transistor Equation If Vgs > Vt and Vds > Vgs - Vt : Ids = (k/2) (W/L) [Vgs -Vt]^2 Vin V d Vout V g I ds V s Otherwise, if Vgs > Vt : Ids = k (W/L) [Vgs -Vt] [Vds] Otherwise: Ids 0, but really = Io [exp((κvg - Vs)/Vo)] [1 - exp(-vds/vo)] Note: Vt is transistor threshold, was formerly Vth. Also, Vt is actually Vt(Vs) sqrt(vs).
Inverters: p-fet Transistor Equation V s I sd If Vsg > Vt and Vsd > Vsg - Vt : Isd = (k/2) (W/L) [Vsg -Vt]^2 Vin V V g d Vout Otherwise, if Vsg > Vt : Otherwise: Isd 0, but again, in reality there is a leakage current. Isd = k (W/L) [Vsg -Vt] [Vsd] Note: Vt for p-fet and n-fet are different. Also true for k (fab constant). kp < kn, due to electrons being faster than holes.
Inverters with Vin = Gnd, Vout = Vdd Is Vsd > Vsg - Vt once Vout is Vdd? Is Vsg > Vt? I sd V s Isd = k (W/L) [Vsg -Vt] [Vsd] Vin I ds V d V d V s Vout This goes as close to 0 as it can while still supplying the leakage current. Ids 0, but really a small leakage current
Inverters with Vin = Vdd, Vout = Gnd Isd 0, but really a small leakage current Vin V s I sd V d V d Vout This goes as close to 0 as it can while still supplying the leakage current. I ds V s Is Vds > Vgs - Vt once Vout is Gnd? Is Vgs > Vt? Ids = k (W/L) [Vgs -Vt] [Vds]
Calculating the inverter threshold (Vth) Vth Tie output to input. Vth I sd V s Assume voltage is somewhere near the middle Vin V d V d Vout For nfet, is Vds > Vgs - Vt? For pfet, is Vsd > Vsg - Vt? I ds No, by definition! Use: V s Ids = kn (W/L) [Vth -Vtn] [Vth] Isd = kp (W/L) [Vdd-Vth -Vtp] [Vdd - Vth] To compute the exact voltage in the middle.
Question: What happens when... I sd V s I sd V s Vin V d V d Vin Vout V d V d Vout I ds V s I ds V s Stays at Vth until a tiny amount of Vin noise appears. Then output goes to Vdd or Gnd until...... Vin noise flips it back the other way. Lesson: at Vth, small dvin make big dvout
Static Memory Circuits Dynamic Memory: Circuit remembers for a fraction of a second. Static Memory: Circuit remembers as long as the power is on. Non-volatile Memory: Circuit remembers for many years, even if power is off.
Recall DRAM cell: 1 T + 1 C Word Line Row Column Bit Line Column Row Word Line Vdd Bit Line
Idea: Store each bit with its complement x x Row Why? Gnd Vdd y Vdd Gnd y We can use the redundant representation to compensate for noise and leakage.
Case #1: y = Gnd, y = Vdd... x x Row I sd y y Gnd Vdd I ds
Case #2: y = Vdd, y = Gnd... x x Row I sd y Vdd y Gnd I ds
Combine both cases to complete circuit Gnd Vdd Vth Vth Vdd Gnd Crosscoupled inverters noise noise y y x x
SRAM Challenge #1: It s so big! SRAM area is 6X-10X DRAM area, same generation... Cell has both transistor types Vdd AND Gnd Capacitors are usually parasitic capacitance of wires and transistors. More contacts, more devices, two bit lines...
Challenge #2: Writing is a fight When word line goes high, bitlines fight with cell inverters to flip the bit -- must win quickly! Solution: tune W/L of cell & driver transistors Initial state Vdd Initial state Gnd Bitline drives Gnd Bitline drives Vdd
Challenge #3: Preserving state on read When word line goes high on read, cell inverters must drive large bitline capacitance quickly, to preserve state on its small cell capacitances Cell state Vdd Cell state Gnd Bitline a big capacitor Bitline a big capacitor
SRAM vs DRAM, pros and cons Big win for DRAM DRAM has a 6-10X density advantage at the same technology generation. SRAM advantages SRAM has deterministic latency: its cells do not need to be refreshed. SRAM is much faster: transistors drive bitlines on reads. SRAM easy to design in logic fabrication process (and premium logic processes have SRAM add-ons)
Flip Flops Revisited
Recall: Static RAM cell (6 Transistors) Gnd Vdd Vth Vth Vdd Gnd Crosscoupled inverters noise noise x x!
Recall: Positive edge-triggered flip-flop D Q A flip-flop samples right before the edge, and then holds value. 8#; Sampling circuit 8#;= Holds value 8#; 8#;= 8#;= 8#; 8#;= :#-8;&1-&<&5"#$% 4".2#1.&,4 16 Transistors: Makes an SRAM look compact! What do we get for the 10 extra transistors? Clocked logic semantics. 8#;
Sensing: When clock is low D Q 8#; A flip-flop samples right before the edge, and then holds value. Sampling circuit 8#;= Holds value 8#;= 8#; 8#; 8#;= clk = 0 clk = 1 8#;= 8#; :#-8;&1-&<&5"#$% 4".2#1.&,4 8#;= 8#; 8#;= 8#; 8#; 8#;= Will capture 8#;= new value on posedge. :#-8;&1-&<&5"#$% 4".2#1.&,4 Outputs 8#; last value captured.
Capture: When clock goes high D Q 8#; A flip-flop samples right before the edge, and then holds value. Sampling circuit 8#;= Holds value 8#;= 8#; 8#; 8#;= clk = 1 clk = 0 8#;= :#-8;&1-&<&5"#$% 8#; 4".2#1.&,4 8#;= 8#; 8#;= 8#; 8#; 8#;= Remembers value 8#;= just captured. :#-8;&1-&<&5"#$% 4".2#1.&,4 Outputs value 8#; just captured.
Admin: Final Xilinx Checkoff Friday... Lab report due Monday, 11:59 PM.
Memory Arrays Calinx DRAM: 133 Mhz, 128 Mb SYNCHRONOUS DRAM 128Mb: x4, x8, x16 SDRAM MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds Data sheet on resources page. Will need to understand for final project!
Last Time: 1-T DRAM cell Bit Line Word Line Vdd Word Line Vdd Capacitor Bit Line Bit Line n+ n+ p- oxide oxide ------ Word Line and Vdd run on z-axis Why Vcap values start out at ground. Vdd Vcap Diode leakage current.
Last Time: DRAM Read is Destructive Bit Line (initialized to a low voltage) +++++++ (stored charge from cell) Word Line + 0 -> Vdd Vc -> 0 Vgs Vdd Raising the word line removes the charge from every cell it connects too! Must write back after each read.
Last Time: DRAM Refresh... Bit Line Word Line Parasitic currents leak away charge. Solution: Refresh, by reading cells at regular intervals (tens of milliseconds) + Vdd n+ n+ p- oxide oxide ------ Diode leakage...
Bit Line Column Word Line Row People buy DRAM for the bits. Edge circuits are overhead So, we amortize the edge circuits over big arrays
A bank of 32 Mb (128Mb chip -> 4 banks) 12-bit row address input 1 o f 4 0 9 6 d e c o d e r 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip
Recall DRAM Challenge #3b: Sensing How do we reliably sense a 60mV signal? Compare the word line against the voltage on [...] a dummy world line. sense amp Word line to sense + Dummy word line.? - Cells hold no charge. Dummy word line
Corresponds to row read into sense amps 12-bit row address input 1 o f 4 0 9 6 d e c o d e r Slow! This 7.5ns period DRAM (133 MHz) can do row reads at only 75 ns ( 13 MHz). Plus, need to add selection time. DRAM has high latency to first bit out. A fact of life. 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip
An ill-timed refresh may add to latency Bit Line Word Line Parasitic currents leak away charge. Solution: Refresh, by reading cells at regular intervals (tens of milliseconds) + Vdd n+ n+ p- oxide oxide ------ Diode leakage...
Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 12-bit row address input 1 o f 4 0 9 6 d e c o d e r What if we want all of the 2048 bits? In row access time (75 ns) we can do 10 transfers at 133 MHz. 8-bit chip bus -> 10 x 8 = 80 bits << 2048 Now the row access time looks fast! 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip
Sadly, it s rarely this good... 12-bit row address input 1 o f 4 0 9 6 d e c o d e r What if we want all of the 2048 bits? The we for a CPU would be the program running on the CPU. Recall Amdalh s law: If 20% of the memory accesses need a new row access... not good. 4096 rows 2048 columns 33,554,432 usable bits (tester found good bits in bigger array) 2048 bits delivered by sense amps Select requested bits, send off the chip
DRAM latency/bandwidth chip features Columns: Design the right interface for CPUs to request the subset of a column of data it wishes: 2048 bits delivered by sense amps Select requested bits, send off the chip Interleaving: Design the right interface to the 4 memory banks on the chip, so several row requests run in parallel. Bank 1 Bank 2 Bank 3 Bank 4
Off-chip interface for the Micron part... A clocked bus protocol (133 MHz) Note! This example is best-case! To access a new row, a slow ACTIVE command must run before the READ. T0 T1 T2 T3 CLK COMMAND READ NOP NOP tlz t OH DQ t AC DOUT DRAM is controlled via commands (READ, WRITE, REFRESH,...) CAS Latency = 2 (CAS = Column Address Strobe) Synchronous data output. From Micron 128 Mb SDRAM data sheet (on resources web page)
Opening a row before reading... CLK T0 T1 T2 T3 T4 T5 T6 T7 T8 tck tcl tch t CKS tckh CKE t CMS t CMH COMMAND ACTIVE NOP NOP 3 NOP 3 READ NOP NOP ACTIVE NOP DQM / DQML, DQMH t CMS t CMH tas tah A0-A9, A11 ROW COLUMN m 2 ROW tas tah ENABLE AUTO PRECHARGE A10 ROW ROW tas tah BA0, BA1 BANK BANK BANK DQ t RCD t RAS t RC 44 ns + CAS Latency tac 6 ns t RP toh DOUT m t HZ 70 ns between row opens
Interleave: Access all 4 banks in parallel T0 T1 T2 T3 T4 T5 T6 CLK COMMAND READ READ READ READ NOP NOP NOP ADDRESS BANK, COL n BANK, COL a BANK, COL x BANK, COL m DQ DOUT n DOUT a DOUT x DOUT m CAS Latency = 3 NOTE: Each READ command may be to any bank. DQM is LOW. Figure 8 Random READ Accesses DON T CARE
Lectures: Coming up next... Essential tools for the final project.