EE-382M VLSI II FLIP-FLOPS Gian Gerosa, Intel Fall 2008 EE 382M Class Notes Page # 1 / 31
OUTLINE Trends LATCH Operation FLOP Timing Diagrams & Characterization Transfer-Gate Master-Slave FLIP-FLOP Merged Functions Clock Skew Other Topologies SCAN References Homework Discussion EE 382M Class Notes Page # 2 / 31
Where are we going? Trends in high-performance systems Higher frequency leads to.. Deeper pipelines or more parallelism leads to. More transistors which leads to. More sequentials (FLOP or LATCH) which leads to. Consequences Increased flip-flop overhead Cycle time in 12-15 stage pipeline uarchitectures ~22 FO4 delays FLOP overhead ~3 FO4 delay (D-Q delay) ~14% Clock uncertainty (jitter & skew) also affects cycle time Clock power EE 382M Class Notes Page # 3 / 31
Why work on Sequentials? In a 3.3 GHZ processor (90n CMOS) cycle=300ps - Typical D-Q delay is ~ 90ps. - If one can design a faster sequential, say D-Q delay of ~ 60pS, this represents ~10% processor performance improvement. - If in addition one can absorb 15ps of uncertainties and/or embed one level of logic, this will yield an additional 5-10% processor performance improvement. - Attaining a 10-20% performance improvement via architecture enhancements is very expensive (area, power, complexity, etc.)! EE 382M Class Notes Page # 4 / 31
Basic LATCH Operation Dout Dout Din Din Transparent-low Transparent-high transparent opaque transparent opaque Din Din Tdq Tdq Dout Dout Tsu Th Tsu Th EE 382M Class Notes Page # 5 / 31
Difference between a LATCH and a FLOP Data F-F Q Edge triggered Clock Clock Data Q Q only changes at the rising edge of the Data Latch Clock Q Transparent / Opaque Clock Data Q EE 382M Class Notes Page # 6 / 31 Q follow s the input DATA
Building a FLOP with Two Latches Dout Din EE 382M Class Notes Page # 7 / 31
FLOP Delay Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed (MAXDELAY) Tcycle = Tcq + Tlogic + Tsu + Tskew Tlogic contains interconnect delay D Q logic D Q N loads Tcq Tlogic Tsu EE 382M Class Notes Page # 8 / 31
FLOP Timing Diagrams Clock volts Din Tsu Thold Dout Tcq 100 200 300 400 500 600 700 800 900 1000 Tsu : input setup time Thold : input hold time Tcq : to out Tdata to out = Tsu + Tcq picoseconds EE 382M Class Notes Page # 9 / 31
Functional Pass/Failure vs. Tsu and Th Master internal node fail fail pass pass Input setup time EE 382M Class Notes Page # 10 / 31 Input hold time
FLOP Characterization Tsu : input setup time Thold : input hold time Tcq : to out Tdata to out = Tsu + Tcq picoseconds Tdata to out 10% Tcq Tsu Thold minimum Tcq -250-200 -150-100 -50 0 50 100 150 200 250 Data to Clock (picoseconds) EE 382M Class Notes Page # 11 / 31
MAXDELAY D1 D Q Q1 D1 logic D Q Q1 Tcycle D1 Tcq Tcq Q1 D1 Tlogic Q1 Tsu Tlogic < Tcycle (Tcq + Tsu) or Tsu Tcycle <= Tlogic + Tcq + Tsu EE 382M Class Notes Page # 12 / 31
MAXDELAY with Clock Skew D1 D Q Q1 D1 logic D Q Q1 Tcycle Tsu Tcq Tcq Tskew D1 Q1 D1 Q1 Tlogic Tsu Tlogic < Tcycle (Tcq + Tsu + Tskew) EE 382M Class Notes Page # 13 / 31
MINDELAY D1 D Q Q1 D1 logic D Q Q1 Tcycle Tcq D1 Q1 D1 Tlogic Q1 Tsu Thold Tlogic > Thold Tcq + Tskew EE 382M Class Notes Page # 14 / 31
DESIGN WINDOW Thold Tcq + Tskew < Tlogic and Tlogic < Tcycle (Tcq + Tsu + Tskew) If Tcq > Thold + Tskew, then MINDELAY hazard is removed since Tlogic >= 0 always. EE 382M Class Notes Page # 15 / 31
T-G Master-Slave FLOP (buffered non-inverting) TIMING: Tsu ~ 1 TG + 2 inverters Th ~ 1 inverter Tcq ~ 1 TG + 1 inverter Dout Din Non time-borrowing Time borrowing keeps the MASTER open longer by ~ 2 inverter delays; need to be careful about MINDELAYS Isolates SLAVE latch timing optimization/sensitivities from output load. EE 382M Class Notes Page # 16 / 31
Merged Function inverting FLOP A B Dout EE 382M Class Notes Page # 17 / 31
RESETABLE Master-Slave FLOP (asynchronous) Dout Din Rb EE 382M Class Notes Page # 18 / 31
Clock Skew Impact to Fmax Din master slave master slave Dout τ1 τ3 LCB τ2 LCB local buffer GLOBAL Tcycle = Tcq + Tlogic + Tsu + T_uncertainty T_uncertainty = skew + jitter skew = τ1 τ2 τ3 EE 382M Class Notes Page # 19 / 31
Other Circuit Topologies for M-S FLOPS C 2 MOS Hybrid Latch Flip-Flop (HLFF) Pulse Latch In Backup: True Single-Phase Clock FLOP K-6 Dual-Rail ETL Semi-Dynamic Flip-Flop (SDFF) EE 382M Class Notes Page # 20 / 31
C 2 MOS FLOPS clk slave Din B B Q clk master B B B D Robustness to slope Low power feedback Poor driving capability EE 382M Class Notes Page # 21 / 31
Hybrid Latch Flip-Flop (HLFF) (AMD K-6, Partovi, ISSCC 1996) N Dout Din Clk Dclk_ EE 382M Class Notes Page # 22 / 31
Hybrid Latch Flip-Flop (HLFF) waveforms TIMING: Sampling Window ~ 3 inverters Tsu ~ 0 to slightly negative Th > sampling window Tcq ~ 2 inverters Clk Dclk_ Din valid N valid Dout valid EE 382M Class Notes Page # 23 / 31
Pulse Latch Din Dout Clock pclk τ EE 382M Class Notes Page # 24 / 31
Pulse Latch Waveforms TIMING: Sampling Window ~ NAND + τ Tsu ~ 0 to slightly negative Th > sampling window Tcq ~ 2 inverters Clock Pclk Din valid Dout valid EE 382M Class Notes Page # 25 / 31
FLOP with SCAN SCAN GADGET B in p n out A Scan_in Scan_out A AB FUNCTIONAL AB A Din Dout EE 382M Class Notes Page # 26 / 31
A Typical Scan Path scanable FLOPS DI # A Store_en Q # A A scanable Latches SI DO B # #_P A B B # B SO Store_en A Q Hold_scan FLOPS (non-destructive scan) # #_P A B # B EE 382M Class Notes Page # 27 / 31
QUICK AREA and TIMING budgets in 130nm Inverting FLIP-FLOP: Area ~ 60 μm 2 Tsu ~ 35ps Tcq ~ 65ps Total FLOP timing overhead ~ 100ps Scan Gadget area ~ 35 μm 2 TOTAL scan inverting FLOP ~ 95 μm 2 This layout does not include scan. EE 382M Class Notes Page # 28 / 31
QUICK AREA and TIMING budgets in 65nm Inverting FLIP-FLOP: Area ~ 15 μm 2 Tsu ~? ps Tcq ~? ps Total FLOP timing overhead ~? ps input 0.45 μm 0.25 μm Rest of FLOP output 0.90 μm 0.50 μm Scan Gadget area ~ 9 μm 2 TOTAL scan inverting FLOP ~ 24 μm 2 EE 382M Class Notes Page # 29 / 31
Design Goals Target: Small load Shortest Din to Dout direct path Low-power feedback Simultaneously optimize both master and slave latches High driving capability Optimize speed * power product while: Minimizing Tsu + Thold (smallest sampling window) Reducing sensitivity to slew rate and skew Not allowing floating nodes Characterization: Use worst case Tcq + Tsu for MAXDELAY analysis. Use worst case Thold for MINDELAY analysis. Take into account all sources of power dissipation EE 382M Class Notes Page # 30 / 31
References 1. A. Chandrakasan, W.J. Bowhill, F. Fox, Design of High-Performance Microprocessor Circuits, IEEE Press, New York, 2001. Chapter 11 Clocked Storage Elements by Hamid Partovi, pages 207-234. 2. V. G. Oklobdzija, The Computer Engineering Handbook, CRC Press, Boca Raton, Florida, 2002. Chapter 10.2 Latches and Flip-Flops by Fabian Klass, pages 10.34-10.69. 3. R. J. Baker, H.W. Li, D.E. Boyce, CMOS Circuit Design, Layout, and Simulation, IEEE Press, New York, 1998. Chapter 13, pages 255-274. 4. V. G. Oklobdzija et. al., Digital System Clocking: High-Performance and Low-Power Aspects, A Wiley-IEEE Press Publication, 264 pages, 2003. Reference 2 has a very nice treatment of FLOPS/LATCHES, MIN/MAXDELAY, SKEW, etc with plenty of timing diagrams. EE 382M Class Notes Page # 31 / 31
BACKUP EE 382M Class Notes Page # 32 / 31
Transfer-Gate (T-G) Master-Slave FLOP Low power feedback Un-buffered inputs input capacitance depends on the phase of the over-shoot and under-shoot with long routes Wire length must be restricted at the input Buffered input addresses above issues Low power Small clk-output delay, but positive setup Easily embedded scan, mux, other simple functions EE 382M Class Notes Page # 33 / 31
Hybrid Latch Flip-Flop Highlights Flip-flop features: single phase edge triggered, on one edge Latch features: Soft edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew minimum delay between flip-flops must be controlled Fully static Possible to incorporate logic EE 382M Class Notes Page # 34 / 31
ATPG Sequence Timing Aclk, Bclk Freq = 1/16 G 1 st System Cycle SI DI A # A DO 1 st Capture in Slave 2 nd 2nd System Cycle # B B SO G launch # DC STUCK @ Capture at speed in slave # Capture at speed in master Transition Fault testing Observe Master A B B A B A STORE_EN SHIFT_EN EE 382M Class Notes Page # 35 / 31
Merged Function MUX-FLOP SelA_ SelB_ SelC_ SelD_ A Dout B C D EE 382M Class Notes Page # 36 / 31
Another RESETABLE Master-Slave FLOP (synchronous) Din Dout Rb EE 382M Class Notes Page # 37 / 31
True Single-Phase Clock (TSPC) FLOP Din X Y Dout MASTER PRE-CHARGE SLAVE Clock power is low; no local inversion required. EE 382M Class Notes Page # 38 / 31
True Single-Phase Clock FLOP Waveforms TIMING: Tsu ~ 2 inverters Th ~ 2 inverters Tcq ~ 3 inverters Din valid X valid Y valid Dout valid EE 382M Class Notes Page # 39 / 31
Semi-Dynamic Flip-Flop (SDFF) N K Dclk Soft edge conditioned by data since first stage is pre-charged - cross-coupled latch is added for robustness Small penalty for adding logic Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists EE 382M Class Notes Page # 40 / 31
Semi-Dynamic Flip-Flop Waveforms TIMING: Sampling Window ~ 2 inverters + 1 NAND Tsu ~ 0 to slightly negative Th > sampling window Tcq ~ 2 inverters Clk Dclk D valid N valid K valid Q valid EE 382M Class Notes Page # 41 / 31
K-6 Dual-Rail ETL Pch A B Dclk_ Determines A, B, Q, and Q_ pulse widths EE 382M Class Notes Page # 42 / 31
K-6 Dual-Rail Waveforms TIMING: Sampling Window ~ 3 inverters Tsu ~ 0 to slightly negative Th > sampling window Tcq ~ 2 inverters Clk Dclk_ D valid A valid B Pch Q valid T valid T is determined by 4 inversions Q_ valid EE 382M Class Notes Page # 43 / 31
HMK#3 Problem 1. For both Din transitions (0->1 and 1->0), determine the input setup Tsu, input hold Thold, and to out Tcq for the following 4 FLIP FLOPS (a, b, c, d). Use 70ps slew rate (full rail) for Din and ; use the 130 nm CMOS transistor models. These designs are all driving a 4.2/2.1 inverter. Show ALL your work; also answer the following questions pertaining to each design: a. List 3 deficiencies with this design. Hint: look at b, c designs. Will this design work for a cycle time of 450ps? Why or why not? b. Is the Din input capacitance lower than design a.? What about the capacitance? c. What are the benefits of placing the slave latch off to the side? Is this a timeborrowing FLOP? Is the capacitance lower than design b? Any benefit in ing the master LATCH feedback? d. This design is a pulsed LATCH. Describe it s behaviour with timing diagrams; Compared to a traditional FLIP-FLOP scheme, list ONE advantage and ONE disadvantage. Simulation Tips: Use HSPICE ic statements to properly initialize these sequential circuits. EE 382M Class Notes Page # 44 / 31
Homework # 3, Problem #1 FLOP design A /0.6 /0.6 0.13/0.6 0.13/0.6 Din din 0.56 1.4 0.7 dout 4.2 2.1 Out 18.0 EE 382M Class Notes Page # 45 / 31
Homework # 3, Problem #1 FLOP design B /0.6 /0.6 0.13/0.6 0.13/0.6 Din din 0.56 0.56 1.4 0.7 Dout_b 4.2 2.1 Out 18 0.56 0.56 EE 382M Class Notes Page # 46 / 31
Homework # 4, Problem #2 FLOP design C /0.6 0.13/0.6 0.13 Din din 0.56 0.56 1.4 0.7 Dout_b 4.2 2.1 Out 18.0 0.13 0.13 0.13 0.13 EE 382M Class Notes Page # 47 / 31
Homework # 4, Problem #2 FLOP design D /0.6 0.13/0.6 0.13 Din din 0.56 0.56 1.4 0.7 Dout 4.2 2.1 Out 18.0 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.56 0.56 pclk EE 382M Class Notes Page # 48 / 31