Introduction to CMOS VLSI esign Lecture 10: Sequential Circuits avid Harris Harvey Mudd College Spring 2004 1
Outline Floorplanning Sequencing Sequencing Element esign Max and Min-elay Clock Skew Time Borrowing Two-Phase Clocking Slide 2 2
Project Strategy Proposal Specifies inputs, outputs, relation between them Floorplan Begins with block diagram Annotate dimensions and location of each block Requires detailed paper design Schematic Make paper design simulate correctly Layout Physical design, RC, NCC, ERC Slide 3 3
Floorplan How do you estimate block areas? Begin with block diagram Each block has Inputs Outputs Function (draw schematic) Type: array, datapath, random logic Estimation depends on type of logic Slide 4 4
MIPS Floorplan 10 I/O pads mips (4.6 Mλ2) 5000 λ 10 I/O pads 3500 λ 1690 λ control 1500 λ x 400 λ (0.6 Mλ2) wiring channel: 30 tracks = 240 λ zipper 2700 λ x 250 λ datapath 2700 λ x 1050 λ (2.8 Mλ2) alucontrol 200 λ x 100 λ (20 kλ2) 10 I/O pads bitslice 2700 λ x 100 λ 2700 λ 3500 λ 10 I/O pads 5000 λ Slide 5 5
Area Estimation Arrays: Layout basic cell Calculate core area from # of cells Allow area for decoders, column circuitry atapaths Sketch slice plan Count area of cells from cell library Ensure wiring is possible Random logic Compare complexity do a design you have done Slide 6 6
MIPS Slice Plan srcb srca bitlines aluresult writedata memdata adr 7 immediate pc aluout 44 24 93 93 93 93 93 44 24 52 48 48 48 48 16 86 93 131 93 44 24 93 131 39 93 39 24 44 39 39 160 131 mux4 fulladder or2 and2 mux2 inv and2 flop and2 mux4 flop inv mux2 flop mux4 flop readmux srampullup dualsrambit0 dualsram dualsram dualsram writedriver inv mux2 flop flop flop flop flop inv mux2 ALU zerodetect PC aluout srcb srca register file ramslice writemux MR IR3...0 Slide 7 adrmux
Typical Layout ensities Typical numbers of high-quality layout erate by 2 for class projects to allow routing and some sloppy layout. Allocate space for big wiring channels Element Random logic (2 metal layers) atapath SRAM RAM ROM Area 1000-1500 λ 2 / transistor 250 750 λ 2 / transistor Or 6 WL + 360 λ 2 / transistor 1000 λ 2 / bit 100 λ 2 / bit 100 λ 2 / bit Slide 8 8
Sequencing Combinational logic output depends on current inputs Sequential logic output depends on current and previous inputs Requires separating previous, current, future Called state or tokens Ex: FSM, pipeline in CL out CL CL Finite State Machine Pipeline Slide 9 9
Sequencing Cont. If tokens moved through pipeline at constant speed, no sequencing elements would be necessary Ex: fiber-optic cable Light pulses (tokens) are sent down cable Next pulse sent before first reaches end of cable No need for hardware to separate pulses But dispersion sets min time between pulses This is called wave pipelining in circuits In most circuits, dispersion is high elay fast tokens so they don t catch slow ones. Slide 10 10
Sequencing Overhead Use flip-flops to delay fast tokens so they move through exactly one stage each cycle. Inevitably adds some delay to the slow tokens Makes circuit slower than just the logic delay Called sequencing overhead Some people call this clocking overhead But it applies to asynchronous circuits too Inevitable side effect of maintaining sequence Slide 11 11
Sequencing Elements Latch: Level sensitive a.k.a. transparent latch, latch Flip-flop: edge triggered A.k.a. master-slave flip-flop, flip-flop, register Timing iagrams Transparent Opaque Edge-trigger (latch) Latch Flop (flop) Slide 12 12
Sequencing Elements Latch: Level sensitive a.k.a. transparent latch, latch Flip-flop: edge triggered A.k.a. master-slave flip-flop, flip-flop, register Timing iagrams Transparent Opaque Edge-trigger (latch) Latch Flop (flop) Slide 13 13
Latch esign Pass Transistor Latch Pros + + Cons Slide 14 14
Latch esign Pass Transistor Latch Pros +Tiny + Low clock load Cons V t drop nonrestoring backdriving output noise sensitivity dynamic diffusion input Used in 1970 s Slide 15 15
Latch esign Transmission gate + - Slide 16 16
Latch esign Transmission gate +No V t drop - Requires inverted clock Slide 17 17
Latch esign Inverting buffer + + + Fixes either X Slide 18 18
Latch esign Inverting buffer + Restoring + No backdriving + Fixes either X Output noise sensitivity Or diffusion input Inverted output Slide 19 19
Latch esign Tristate feedback + X Slide 20 20
Latch esign Tristate feedback + Static Backdriving risk X Static latches are now essential Slide 21 21
Latch esign Buffered input + + X Slide 22 22
Latch esign Buffered input + Fixes diffusion input + Noninverting X Slide 23 23
Latch esign Buffered output + X Slide 24 24
Latch esign Buffered output + No backdriving X Widely used in standard cells + Very robust (most important) - Rather large - Rather slow (1.5 2 FO4 delays) - High clock loading Slide 25 25
Latch esign atapath latch + - X Slide 26 26
Latch esign atapath latch + Smaller, faster - unbuffered input X Slide 27 27
Flip-Flop esign Flip-flop is built as pair of back-to-back latches X X Slide 28 28
Enable Enable: ignore clock when en = 0 Mux: increase latch - delay Clock Gating: increase en setup time, skew Symbol Multiplexer esign Clock Gating esign en Latch 1 0 Latch Latch en en en Flop 1 0 en Flop Flop en Slide 29 29
Reset Force output low when reset asserted Synchronous vs. asynchronous Symbol Latch Flop reset reset Synchronous Reset Asynchronous Reset reset reset reset reset reset reset Slide 30 30
Set / Reset Set forces output high when enabled Flip-flop with asynchronous set and reset reset set reset set Slide 31 31
Sequencing Methods Flip-flops T c 2-Phase Latches Pulsed Latches Flip-Flops Flop Combinational Logic Flop 2-Phase Transparent Latches Pulsed Latches 1 2 p 1 2 1 Latch t pw p Latch T c /2 Combinational Logic t nonoverlap Latch Combinational Logic Combinational Logic Half-Cycle 1 Half-Cycle 1 t nonoverlap Latch p Latch Slide 32 32
Timing iagrams Contamination and Propagation elays t pd Logic Prop. elay A Combinational Logic Y A Y t cd t pd t cd t pcq t ccq Logic Cont. elay Latch/Flop Clk- Prop elay Latch/Flop Clk- Cont. elay Flop t setup thold t pcq t pdq Latch - Prop elay t ccq t pcq t setup t hold Latch - Cont. elay Latch/Flop Setup Time Latch/Flop Hold Time Latch t ccq t pcq t setup t hold t cdq t pdq Slide 33 33
Max-elay: Flip-Flops t pd ( ) Tc 1 42443 sequencing overhead F1 1 Combinational Logic 2 F2 T c t pcq t setup 1 t pd 2 Slide 34 34
Max-elay: Flip-Flops ( setup ) tpd Tc t + tpcq 14243 sequencing overhead F1 1 Combinational Logic 2 F2 T c t pcq t setup 1 t pd 2 Slide 35 35
Max elay: 2-Phase Latches ( ) tpd = tpd1+ tpd 2 Tc 1 42443 sequencing overhead 1 2 1 1 1 Combinational 2 2 Combinational 3 Logic 1 Logic 2 L1 L2 L3 3 1 2 T c 1 t pdq1 1 t pd1 2 t pdq2 2 t pd2 3 Slide 36 36
Max elay: 2-Phase Latches ( 2 ) tpd = tpd1+ tpd 2 Tc tpdq 123 sequencing overhead 1 2 1 1 1 Combinational 2 2 Combinational 3 Logic 1 Logic 2 L1 L2 L3 3 1 2 T c 1 t pdq1 1 t pd1 2 t pdq2 2 t pd2 3 Slide 37 37
Max elay: Pulsed Latches t pd ( ) Tc max 144 4244443 sequencing overhead 1 p L1 1 Combinational Logic 2 p L2 2 T c 1 t pdq (a) t pw > t setup 1 t pd 2 p (b) t pw < t setup 1 2 t pcq T c t pw tpd tsetup Slide 38 38
Max elay: Pulsed Latches ( setup ) tpd Tc max tpdq, tpcq + t tpw 14444244443 sequencing overhead 1 p L1 1 Combinational Logic 2 p L2 2 T c 1 t pdq (a) t pw > t setup 1 t pd 2 p (b) t pw < t setup 1 2 t pcq T c t pw tpd tsetup Slide 39 39
Min-elay: Flip-Flops 1 t CL cd F1 2 F2 1 t ccq t cd 2 t hold Slide 40 40
Min-elay: Flip-Flops 1 t t t CL cd hold ccq F1 2 F2 1 t ccq t cd 2 t hold Slide 41 41
Min-elay: 2-Phase Latches t t cd1, cd 2 1 L1 1 CL Hold time reduced by nonoverlap 2 2 L2 Paradox: hold applies twice each cycle, vs. only once for flops. 1 2 t nonoverlap 1 t ccq t cd But a flop is made of two latches! 2 t hold Slide 42 42
Min-elay: 2-Phase Latches t t t t t cd1, cd 2 hold ccq nonoverlap 1 L1 1 CL Hold time reduced by nonoverlap 2 2 L2 Paradox: hold applies twice each cycle, vs. only once for flops. 1 2 t nonoverlap 1 t ccq t cd But a flop is made of two latches! 2 t hold Slide 43 43
Min-elay: Pulsed Latches p tcd 1 CL L1 Hold time increased by pulse width 2 p L2 p t pw t hold 1 t ccq t cd 2 Slide 44 44
Min-elay: Pulsed Latches tcd thold tccq + t 1 pw CL p L1 Hold time increased by pulse width 2 p L2 p t pw t hold 1 t ccq t cd 2 Slide 45 45
Time Borrowing In a flop-based system: ata launches on one rising edge Must setup before next rising edge If it arrives late, system fails If it arrives early, time is wasted Flops have hard edges In a latch-based system ata can pass through latch while transparent Long cycle of logic can borrow time into next As long as each loop completes in one cycle Slide 46 46
Time Borrowing Example 1 2 1 1 2 (a) Latch Combinational Logic Latch Combinational Logic Latch Borrowing time across half-cycle boundary Borrowing time across pipeline stage boundary 1 2 (b) Latch Combinational Logic Latch Combinational Logic Loops may borrow time internally but must complete within the cycle Slide 47 47
How Much Borrowing? 2-Phase Latches T borrow c setup + nonoverlap ( ) t t t 2 1 1 2 L1 1 2 Combinational Logic 1 L2 2 1 Pulsed Latches 2 T c t nonoverlap t t t borrow pw setup T c /2 Nominal Half-Cycle 1 elay t borrow t setup 2 Slide 48 48
Clock Skew We have assumed zero clock skew Clocks really have uncertainty in arrival time ecreases maximum propagation delay Increases minimum contamination delay ecreases time borrowing Slide 49 49
Skew: Flip-Flops ( setup skew ) tpd Tc tpcq + t + t 14 424443 t t t + t cd hold sequencing overhead ccq skew 1 F1 1 t pcq Combinational Logic T c t pdq 2 t setup F2 t skew 2 F1 1 CL 2 F2 t skew t hold 1 t ccq 2 t cd Slide 50 50
Skew: Latches 2-Phase Latches ( 2 ) tpd Tc tpdq 123 sequencing overhead t, t t t t + t cd1 cd 2 hold ccq nonoverlap skew T t t + t + t 2 ( ) c borrow setup nonoverlap skew Pulsed Latches ( setup skew ) tpd Tc max tpdq, tpcq + t tpw + t 1444442444443 t t + t t + t cd hold pw ccq ( ) t t t + t sequencing overhead skew borrow pw setup skew 1 2 1 2 1 1 1 Combinational 2 2 Combinational 3 Logic 1 Logic 2 L1 L2 L3 3 Slide 51 51
Two-Phase Clocking If setup times are violated, reduce clock speed If hold times are violated, chip fails at any speed In this class, working chips are most important No tools to analyze clock skew An easy way to guarantee hold times is to use 2- phase latches with big nonoverlap times Call these clocks 1, 2 (ph1, ph2) Slide 52 52
Safe Flip-Flop In class, use flip-flop with nonoverlapping clocks Very slow nonoverlap adds to setup time But no hold times In industry, use a better timing analyzer Add buffers to slow signals if hold time is at risk 2 1 X 2 1 2 1 2 1 Slide 53 53
Summary Flip-Flops: Very easy to use, supported by all tools 2-Phase Transparent Latches: Lots of skew tolerance and time borrowing Pulsed Latches: Fast, some skew tol & borrow, hold time risk Slide 54 54