Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

EE241 - Spring 2013 Advanced Digital Integrated Circuits Lecture 14: Statistical timing Latches Announcements Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today Makeup office hour tomorrow 11-12 in BWRC 2 1

Outline Last lecture Latch-based timing This lecture Variability and timing Latches 3 4. Design for performance B. Statistical timing 2

Pictorial view of setup and hold tests Actual early AT Early RAT 0 or more switching(s) allowed Actual late AT Late RAT Data must be stable Hold time Latest clock arrival time Early slack Data must be stable Late Setup time slack Earliest clock arrival time (next cycle) 5 Handling of across-chip variation Each gate has a range of delay: [lb, ub] The lower bound is used for early timing The upper bound is used for late timing This is called an early/late split Static timing obtains bounds on timing slacks Timing is performed as one forward pass and one backward pass Setup test Hold test Launching late path Launching early path LF CF LF CF Capturing early path Capturing late path 6 3

How is the early/late split computed? The best way is to take known effects into account during characterization of library cells History effect, simultaneous switching, pre-charging of internal nodes, etc. This drives separate characterization for early and late; this is the most accurate method Failing that, the most common method is derating factors Example: Late delay = library delay * 1.05 Early delay = library delay * 0.95 The IBM way of achieving derating is LCD factors (Linear Combination of Delay) (FC=fast chip, SC=slow chip, see next page) Late delay = L * FC_delay + L * NOM_delay + L * SC_delay Early delay = E * FC_delay + E * NOM_delay + E * SC_delay Across-chip variation is therefore assumed to be a fixed proportion of chipto-chip variation for each cell type 7 IBM delay modeling* At a given corner late delay = intrinsic + systematic + random early delay = intrinsic systematic random Intrinsic: Chip means Systematic ACV Random ACV *P. S. Zuchowski, ICCAD 04 Early Late Early Late 8 4

Traditional timing corners Intra-chip variation Intra-chip variation Fast chip early Fast chip late Slow chip early Slow chip late Chip-to-chip variation Fast chip Slow chip 9 The problem with an early/late split The early/late split is very useful Allows bounds during delay modeling Any unknown or hard-to-model effect can be swept under the rug of an early/late split But, it has problems Additional pessimism (which may be desirable) Unnecessary pessimism (which is never desirable) Setup test Launching late path LF CF This physically common portion can t be both fast and slow at the same time Capturing early path 10 5

Additional pessimism: Clock tree view FP1 and FP2 FP3 LF 1 LF 2 LF 3 Comb. CF Comb. Comb. 11 How to have less pessimism? Common path pessimism removal Account for correlations Credit for statistical averaging of random 12 6

Statistical timing Deterministic a + c + MAX Statistical b a + c + MAX b 13 The problem of correlations There are many reasons for correlations Chip-to-chip variations are perfectly correlated within a single chip Same circuit types Same device families Same metal levels Same voltage islands Same regions of the chip Dependence on common sources of variation Reconvergent fanout Etc. In a reasonable-sized chip, there may be 100 million timing quantities, so we don t handle correlations in the classical way Not by storing and manipulating a 100M x 100M covariance matrix 14 7

Canonical form a0 a1 X 1 a2 X 2 an X n a n 1 R a Constant (nominal value) Sensitivities Deviation of global sources of variation from nominal values All timing quantities are parameterized by the sources of variation Correlation can be judged on-demand by inspection Independently random uncertainty 15 Statistical timing basics Represent all timing quantities in canonical form Delays, slews, guard times, ATs, RATs, slacks, PLL adjusts, constraints, CPPR adjusts Propagate ATs forward through the timing graph Addition of two canonical forms is easy Max/min operations are also easy with the help of some analytic formulas Propagate RATs backward through the timing graph Subtraction of two canonical forms is easy Use statistical max/min operations Slack is simply the difference between AT and RAT Since this is available in canonical form, we get sensitivities of circuit performance to sources of variation for free These can be used to ensure a robust design 16 8

Statistical max operation *C. E. Clark, The greatest of a finite set of random variables, OR Journal, March-April 1961, pp. 145 162 **M. Cain, The moment-generating function of the minimum of bivariate normal random variables, American Statistician, May 94, 48(2) 17 1 Unified view of correlations Correlation Coefficient Independently random part D a 0 a X a X i i r R Spatially correlated part: within-chip distancerelated correlation Globally correlated part: chip-to-chip, wafer-to-wafer, batch-tobatch variation 0 Distance 18 9

26 Spatial correlation vs. early/late split 79 81 83 85 87 89 78 80 82 84 86 88 90 66 68 70 72 74 76 65 67 69 71 73 75 77 53 55 57 59 61 63 52 54 56 58 60 62 40 42 44 46 48 50 39 41 43 45 47 49 27 29 31 LF 28 1 LF 30 2 32 LF 3 34 36 CF 14 16 18 20 22 24 13 15 17 19 21 23 25 1 3 5 7 9 11 0 2 4 6 8 10 12 33 35 37 early clock 64 51 38 Dependence on common virtual variables cancels out at the timing test 19 4. Design for performance C. Latches and flip-flops 10

Latch vs. Flip-Flop Courtesy of IEEE Press, New York. 2000 21 Latch Pair vs. Flip-Flop Performance metrics Delay metrics Delay penalty Clock skew penalty Inclusion of logic Inherent race immunity Power/Energy Metrics Power/energy PDP, EDP Design robustness 22 11

Latches Transmission-Gate Latch C 2 MOS Latch D Q D Q 23 Latches Courtesy of IEEE Press, New York. 2000 24 12

Latch Pair as a Flip-Flop 25 Requirements for the Flip-Flop Design High speed of operation: Small -Output delay Small setup time Small hold time Inherent race immunity Low power Small clock load High driving capability Integration of logic into flip-flop Multiplexed or clock scan Robustness 26 13

Sources of Noise Courtesy of IEEE Press, New York. 2000 27 Types of Flip-Flops Latch Pair (Master-Slave) Pulse-Triggered Latch Data L1 L2 L Data D Q D Q D Q 28 14

Flip-Flop Delay Sum of setup time and -output delay is the true measure of the performance with respect to the system speed T = T -Q + T Logic + T setup (ignoring skew) D Q Logic D Q N T -Q T Logic T Setup 29 Delay vs. Setup/Hold Times 350 Minimum Data-Output 300 250 -Output [ps] Setup 200 150 100 Hold 50 0-200 -150-100 -50 0 50 100 150 200 Data- [ps] 30 15

Master-Slave Latch Pairs Case 1: PowerPC 603 (Gerosa, JSSC 12/94) D b Q b 31 Master-Slave Latches Case 2: C 2 MOS D Ck Ckb Q Ckb Ck Ck Ckb Ck Feedback added for static operation Locally generated clock Poor driving capability Ck Ckb 32 16

Next Lecture Latches and flip-flops 33 17