Sequencing ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2013 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/
Outlines Introduction Sequencing Methods Latches and Flip-Flops Sequential System Design Conclusion VLSI-09-2
Sequential Machines Use memory elements to make primary output values depend on (state + primary inputs). Varieties: Mealy machines outputs function of present state and inputs; Moore machines outputs depend only on state. Machine computes next state N, primary outputs O from current state S, primary inputs I. Next-state function: N = (I,S). Output function (Mealy): O = (I,S). Duty cycle: fraction of clock period for which clock is active (e.g., for active-low clock, fraction of time clock is 0). VLSI-09-3
FSM Structure VLSI-09-4
Latch Flop Sequencing Elements Latch: level sensitive Transparent latch, D latch Flip-flop: edge triggered Master-slave flip-flop, D flip-flop, D register Timing Diagrams Transparent Edge-trigger clk clk D Q D Q clk D Q (latch) Q (flop) VLSI-09-5
Memory Elements Store a value as controlled by one or more control inputs. May have multiple control inputs. Clock, Load, S-R, In CMOS, memory is created by: capacitance (dynamic); feedback (static). Storage element Latch: transparent when internal memory is being set from input. Flip-flop: not transparent reading input and changing output are separate events. VLSI-09-6
Memory Categories Memory Arrays Random Access Memory Serial Access Memory Content Addressable Memory (CAM) Read/Write Memory (RAM) (Volatile) Read Only Memory (ROM) (Nonvolatile) Shift Registers Queues Static RAM (SRAM) Dynamic RAM (DRAM) Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) First In First Out (FIFO) Last In First Out (LIFO) Mask ROM Programmable ROM (PROM) Erasable Programmable ROM (EPROM) Electrically Erasable Programmable ROM (EEPROM) Flash ROM VLSI-09-7
Setup & Hold Times Setup time: time before clock during which data input must be stable. Hold time: time after clock event for which data input must remain stable. clock data VLSI-09-8
Flip-Flops 2-Phase Transparent Latches Pulsed Latches Flop Latch Latch Latch Flop Latch Latch Sequencing Methods Flip-flops T c 2-Phase Latches clk Pulsed Latches clk Combinational Logic clk 1 T c /2 t nonoverlap t nonoverlap 2 1 2 1 Combinational Logic Combinational Logic Half-Cycle 1 Half-Cycle 1 p t pw p p Combinational Logic VLSI-09-9
Latch Flop Timing Diagrams Contamination and Propagation Delays t pd Logic Prop. Delay A Combinational Logic Y A Y t cd t pd t cd t pcq t ccq t pdq Logic Cont. Delay Latch/Flop Clk-Q Prop Delay Latch/Flop Clk-Q Cont. Delay Latch D-Q Prop Delay D clk Q clk D Q t setup t ccq t hold t pcq t cdq t setup Latch D-Q Cont. Delay Latch/Flop Setup Time D clk Q clk D t setup t hold t t ccq pcq t pdq t hold Latch/Flop Hold Time Q t cdq VLSI-09-10
F1 F2 Max-Delay: Flip-Flops t T t t pd c setup pcq sequencing overhead clk clk Q1 Combinational Logic D2 T c clk t pcq t setup Q1 t pd D2 VLSI-09-11
Max-Delay Example (1/2) Suppose the registers are built from flip-flops with a setup time of 62 ps, hold time of -10 ps, propagation delay of 90 ps and contamination delay of 75 ps. VLSI-09-12
Max-Delay Example (2/2) T c t pcq t pd t setup t pd 590 60 100 80 100 70 1000 ps T c 90 1000 62 1152 ps VLSI-09-13
L1 L2 L3 Max Delay: 2-Phase Latches t t t T 2t pd pd1 pd 2 c pdq 1 2 1 sequencing overhead D1 Q1 Combinational D2 Q2 Combinational D3 Logic 1 Logic 2 Q3 1 2 T c D1 t pdq1 Q1 t pd1 D2 t pdq2 Q2 t pd2 D3 VLSI-09-14
Max Delay: Pulsed Latches t T max t, t t t pd c pdq pcq setup pw p sequencing overhead p D1 L1 Q1 Combinational Logic D2 L2 Q2 T c D1 t pdq (a) t pw > t setup Q1 t pd D2 p t pcq T c t pw Q1 t pd t setup (b) t pw < t setup D2 VLSI-09-15
Max-Delay Example Re-compute the ALU self-bypass path cycle time if the flip-flop is replaced with a pulsed latch. The pulsed latch has a pulse width of 150 ps, a setup time of 40 ps, a hold time of 5 ps, a clk-to-q propagation delay of 82 ps and contamination delay of 52 ps, and a D- to-q propagation delay of 92 ps. Solution: t T max t, t t t pd c pdq pcq setup pw sequencing overhead T c max( 92 1000,82 1000 40 150) 1092 ps VLSI-09-16
F1 F2 Min-Delay: Flip-Flops clk Q1 CL t t t cd hold ccq clk D2 clk Q1 t ccq t cd D2 t hold VLSI-09-17
Min-Delay Example In the ALU self-bypass example with the flip-flop from Fig. 7.6, the earliest input to the late bypass multiplexer is the imm value coming from another flip-flop. Will this path experience any hold time failures? Solution: No. The late bypass mux has t cd =45 ps. The flip-flops have t hold =-10ps and t ccq =75 ps. Hence, t cd =45 ps is larger than (t hold -t ccq =-10-75=-85 ps). VLSI-09-18
L2 L1 D2 1 2 Min-Delay: 2-Phase Latches 2 t t t t t cd1, cd 2 hold ccq nonoverlap t nonoverlap 1 Q1 t ccq CL Hold time reduced by nonoverlap Paradox: hold applies twice each cycle, vs. only once for flops. But a flop is made of two latches! Q1 t cd D2 t hold VLSI-09-19
L1 L2 Min-Delay: Pulsed Latches t t t t cd hold ccq pw p p Q1 CL Hold time increased by pulse width D2 p t pw t hold Q1 t ccq t cd D2 VLSI-09-20
Time Borrowing In a flop-based system: Data launches on one rising edge Must setup before next rising edge If it arrives late, system fails If it arrives early, time is wasted Flops have hard edges In a latch-based system Data can pass through latch while transparent Long cycle of logic can borrow time into next As long as each loop completes in one cycle VLSI-09-21
Latch Latch Latch Latch Latch Time Borrowing Example 1 2 1 1 2 (a) Combinational Logic Combinational Logic Borrowing time across half-cycle boundary Borrowing time across pipeline stage boundary 1 2 (b) Combinational Logic Combinational Logic Loops may borrow time internally but must complete within the cycle VLSI-09-22
L1 L2 2-Phase Latches How Much Borrowing? T borrow c setup nonoverlap borrow 2 1 2 t t t Pulsed Latches t t t pw setup D1 Q1 Combinational Logic 1 D2 Q2 1 2 t nonoverlap T c T c /2 Nominal Half-Cycle 1 Delay t borrow t setup D2 VLSI-09-23
Clock Skew We have assumed zero clock skew Clocks really have uncertainty in arrival time Decreases maximum propagation delay Increases minimum contamination delay Decreases time borrowing Clock must arrive at all memory elements in time to load data. VLSI-09-24
Clock Skew: Flip-Flops clk clk F1 Q1 Combinational Logic D2 F2 T c t T t t t pd c pcq setup skew clk Q1 t pcq t pdq t setup t skew sequencing overhead D2 t t t t cd hold ccq skew clk F1 Q1 CL clk D2 F2 t skew clk t hold Q1 t ccq D2 t cd VLSI-09-25
L1 L2 L3 Clock Skew: Latches 2-Phase Latches t T 2t pd c pdq sequencing overhead t, t t t t t cd1 cd 2 hold ccq nonoverlap skew 1 2 1 2 1 D1 Q1 Combinational D2 Q2 Combinational D3 Logic 1 Logic 2 Q3 T t t t t 2 c borrow setup nonoverlap skew Pulsed Latches t T max t, t t t t pd c pdq pcq setup pw sequencing overhead skew t t t t t cd hold pw ccq t t t t skew borrow pw setup skew VLSI-09-26
Two-Phase Clocking If setup times are violated, reduce clock speed If hold times are violated, chip fails at any speed In this class, working chips are most important No tools to analyze clock skew An easy way to guarantee hold times is to use 2- phase latches with big nonoverlap times Call these clocks 1, 2 (ph1, ph2) VLSI-09-27
Signal Skew Machine data signals must obey setup and hold times avoid signal skew. VLSI-09-28
Data Shoot Through Latches do not cut combinational logic when clock is active. Latch-based machines must use multiple ranks of latches. Multiple ranks require multiple phases of clock. Data shoot through occurs if single-phase latch is used. VLSI-09-29
Unbalanced Delays Logic with unbalanced delays leads to inefficient use of logic: short clock period long clock period VLSI-09-30
Retiming Solution Retiming moves memory elements through combinational logic: Property: Retiming changes encoding of values in registers, but proper values can be reconstructed with combinational logic. Retiming must preserve number of latches OR registers around a cycle. VLSI-09-31
Summary Flip-Flops: Very easy to use, supported by all tools 2-Phase Transparent Latches: Lots of skew tolerance and time borrowing Pulsed Latches: Fast, some skew tol & borrow, hold time risk VLSI-09-32
Outlines Introduction Sequential Methods Latches and Flip-Flops Sequential System Design Conclusion VLSI-09-33
Dynamic Latch (1/3) Pass Transistor Latch Pros Tiny Low clock load Cons V t drop Leakage away Backdriving Diffusion input D Q Used in 1970 s Transmission gate No V t drop Leakage away Backdriving Diffusion input Requires inverted clock D Q VLSI-09-34
Dynamic Latch (2/3) Store charge on inverter gate capacitance: = 0: transmission gate is off, inverter output is determined by storage node. = 1: transmission gate is on, inverter output follows D input. VLSI-09-35
Dynamic Latch (3/3) Inverting buffer No V t drop Leakage away No backdriving Fixes either Diffusion input (upper side) Output noise sensitivity with inverted output (bottom side) Setup and hold times determined by transmission gate must ensure that value stored on transmission gate is solid. D D Q X Q VLSI-09-36
Stick Diagram V DD D Q V SS VLSI-09-37
Physical Layout V DD D Q V SS VLSI-09-38
Multiplexer Dynamic Latch VLSI-09-39
Static Latch (1/3) Must use feedback to restore value. Some latches are static on one phase (pseudo-static) load on one phase, activate feedback on other phase. SR Latch VLSI-09-40
Static Latch (2/3) Tristate feedback No V t drop Leakage compensation D X Q Backdriving risk Diffusion input Non-isolated from output noise Requires inverted clock Buffered input No V t drop Leakage compensation No backdriving No diffusion input D X Q Non-isolated from output noise Requires inverted clock VLSI-09-41
Static Latch (3/3) Buffered output No V t drop Leakage compensation No backdriving No diffusion input Isolated from output noise Requires inverted clock Widely used in Artisan standard cells Very robust (most important) Rather large Rather slow (1.5 2 FO4 delays) High clock loading D Q X VLSI-09-42
Multiplexer Static Latches Mux Static Latch No V t drop Leakage compensation No backdriving No diffusion input Requires inverted clock Negative Latch Positive Latch VLSI-09-43
Recirculating Quasi-Static Latch Eliminate the problem: the value stored on the capacitor leaks away over time on dynamic latch Quasi-static: the latch data will vanish if the clocks are ceased. (i.e. static on one phase) VLSI-09-44
Clocked Inverter = 0: If both clocked transistors are off, output is floating. = 1: If both clocked inverters are on, acts as an inverter to drive output. circuit symbol VLSI-09-45
Clocked Inverter Latch = 0: i1 is off, i2-i3 form feedback circuit. = 1: i2 is off, breaking feedback; i1 is on, driving i3 and output. Static Latch is transparent when = 1. VLSI-09-46
Flip-Flops Not transparent use multiple storage elements to isolate output from input. Edge-Trigger: master-slave VLSI-09-47
Master-Slave Flip-Flop = 0: master latch is disabled; slave latch is enabled, but master latch output is stable, so pop the output of the master. = 1: master latch is enabled, loading value from input; slave latch is disabled, maintaining old output value. master slave D Q VLSI-09-48
Latch-Based Flip-Flop The storage nodes have to be refreshed at periodic intervals D X Q Q D X Q VLSI-09-49
Resettable Latches and Flip-Flop VLSI-09-50
Static Latch-Based Flip-Flop: Clock Skew Problem D-Latch D-Latch The 1-1 clock overlap introduces a race condition. During the 1-1 overlap, node A is driven by both D and B. VLSI-09-51
Outlines Introduction Sequencing Methods Latches and Flip-Flops Sequential System Design Conclusion VLSI-09-52
Sequential Machine Design Procedure Step1: Specification Step2: Formulation Obtain a state diagram or state table Step3: State Assignment Obtain state table if only a state diagram is available previously and assign binary codes to the states Step4: Flip-Flop Input Equation Determination Select flip-flop types and derive flip-flop equations from next state entries in the table Step5: Output Equation Determination Derive output equations from output entries in the table Step6: Optimization Optimize the equations Step7: Technology Mapping Find circuit from equations and map to flip-flops and gate Step8: Verification Verify correctness of final design VLSI-09-53
State Transition Graphs/Tables Basic functional description of FSM. Symbolic truth table for next-state, output functions: no structure of logic; no encoding of states. State transition graph and table are functionally equivalent. VLSI-09-54
State Assignment Must find binary encoding for symbolic states state assignment. State assignment affects: combinational logic area; combinational logic delay; memory element area. May also encode some machine inputs/outputs. VLSI-09-55
Example: One-bit Counter (1/4) Easy to specify as one-bit counter. Harder to specify n-bit counter behavior. Can specify n-bit counter as structure made of 1-bit counters. State table: Count Cin Next Count Cout (Carry Out) 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 VLSI-09-56
One-bit Counter Implementation (2/4) XOR computes next value of this bit of counter. NAND/inverter computes carry-out. VLSI-09-57
One-bit Counter Sticks Diagram (3/4) C out V DD l1(latch) n(nand) i(inv) x(xor) l2(latch) 1 1 2 2 C in V SS VLSI-09-58
n-bit Counter Structure (4/4) VLSI-09-59
Example: 01 String Recognizer (1/5) Behavior of machine which recognizes 01 in continuous stream of bits. Operation: Waits for 0 to appear in state bit1. Goes into separate state bit2 when 0 appears. If 1 appears immediately after 0, can t have a 01 on next cycle, so can go back to wait for 0 in state bit1. Time 0 1 2 3 4 5 Input 0 0 1 1 0 1 State Bit1 Bit2 Bit2 Bit1 Bit1 Bit2 Next Bit2 Bit2 Bit1 Bit1 Bit2 Bit1 Output 0 0 1 0 0 1 VLSI-09-60
State Transition Table (2/5) Operation: Waits for 0 to appear in state bit1. Goes into separate state bit2 when 0 appears. If 1 appears immediately after 0, can t have a 01 on next cycle, so can go back to wait for 0 in state bit1. Input Present Next Output 0 Bit1 Bit2 0 1 Bit1 Bit1 0 0 Bit2 Bit2 0 1 Bit2 Bit1 1 VLSI-09-61
State Transition Graph (3/5) Equivalent to state transition table: VLSI-09-62
01 Recognizer Encoding (4/5) Choose bit1=0, bit2=1, and then truth table is as follows: Input Present Next Output 0 0 1 0 1 0 0 0 0 1 1 0 1 1 0 1 VLSI-09-63
01 Recognizer Logic Implementation (5/5) After encoding, truth table can be implemented in gates: Q D Q D VLSI-09-64
Power Optimization Memory elements stop glitch propagation: Glitch VLSI-09-65
Conclusions You should learn in depth about the following topics: Latch Flip-Flop Sequencing Sequential Circuits Sequential system clock discipline VLSI-09-66