igital ystem Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic Chapter 8: tate-of-the-art Clocked torage Elements in CMO Technology Wiley-Interscience and IEEE Press, January 2003 Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 2 1
Transmission gate latches implest implementation Basic static latch Complete implementation (a) (b) (c) - only 4 transistors -ynamic when =1 -usceptible to noise - pull-up/pull-down keeper - Conflict at node whenever new data is written - Feedback turned off when writing to the latch - No conflict - Larger clock load Nov. 14, 2003 3 Transmission Gate Master-lave Latch (ML) 1 1 M M 1 Master Transmission Gate Latch ML with unprotected input (Gerosa et al. 1994), Copyright 1994 IEEE lave Transmission Gate Latch Nov. 14, 2003 4 2
Transmission Gate M Latch (continued) 1 1 M M 1 1 removed Protection from input noise ML with input gate isolation (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 5 Noise Robustness of M Latch istant driver 1 1 2 3 4 2 5 V V 1 noise on input 2 leakage α-particle and cosmic 3 rays 4 unrelated signal coupling 5 pow er supply ripple ources of noise affecting the latch state node (Partovi in Chandrakasan et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 6 3
Clocked CMO (C 2 MO) Latch removed Transmission gate latch with gate isolation (dynamic) C 2 MO latch (dynamic) Nov. 14, 2003 7 Clocked CMO (C 2 MO) M Latch 1 M 1 1 1 1 tate-keeping feedbacks outside the -to- path (uzuki et al. 1973), Copyright 1973 IEEE Nov. 14, 2003 8 4
M Latches: Comparison elay () and Race immunity (R) Energy per cycle C 2 MO: larger clock transistors: -maller delay and race immunity (80% of ML) -Higher energy consumption (1.4x more than ML) Nov. 14, 2003 9 Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 10 5
Hybrid Latch Flip-Flop (HLFF) 1 Transparent to only when and 1 are both high Limited clock uncertainty absorption mall delay mall clock load (Partovi et al. 1996), Copyright 1996 IEEE Nov. 14, 2003 11 emidynamic Flip-Flop (FF) I 1 ynamic-style first stage Fast, small clock load, logic embedding Consumes energy for evaluation whenever =1 ynamic-to-static latch in second stage tatic 1 hazard (Klass 1998), Copyright 1998 IEEE Nov. 14, 2003 12 6
tatic 1 hazard in FF If ==1 in previous cycle, race between and causes to falsely switch to 0 generated glitch Also seen in HLFF Nov. 14, 2003 13 ense-amplifier Flip-Flop (AFF) When =0, and R are high, and unchanged At rising edge of sense amplifier in 1 st stage generates a low pulse on either or R, based on which of and is higher Other node R or is driven high, preventing further changes Latch captures low level of or R and updates output Original design (Montanaro et al. 1996), Copyright 1996 IEEE Nov. 14, 2003 14 Both NAN gates must sequentially switch to change and R 7
AFF: Evolution of 2 nd tage Latch R R R R R R R R R R R R R R all-n-mo push-pull (Gieseke et al. 1991); complementary push-pull (Oklobdzija and tojanovic 2001) complementary push-pull with gated keeper (Nikolic, tojanovic, Oklobdzija, Jia, Chiu, Leung 1999). Nov. 14, 2003 15 Modified ense Amplifier Flip-Flop (MAFF) ense amplifier in 1 st stage generates a low pulse on either or R, based on which of and is higher ymmetric latch in 2 nd stage outputs are simultaneously pulled to Vdd and Gnd fast Large drive capability can be small Keeper in latch active only when there is no change No conflict R R (Nikolic et al. 1999), Copyright 1999 IEEE Nov. 14, 2003 16 8
Flip-Flops and M Latches: elay Comparison ( ) elay [FO4] 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 ML C2MO HLFF FF AFF M-AFF M Latches are slow positive setup time, two latches in critical path AFF is slow: it waits for one output to switch the other Fastest structures are simple flip-flops with negative setup time CE delay comparison (0.18 μm, high load) Nov. 14, 2003 17 Flip-Flops: Timing Comparisons with Voltage caling elay comparison: Internal race immunity comparison: - Relative delay reduces with - mall race immunity, usually not a supply voltage due to concern in critical paths reduction of body effect (0.25 μm, light load) Nov. 14, 2003 18 9
Energy [fj] 120 100 80 60 40 Flip-Flops and M Latches: Energy Comparison Ext. clock Ext. data Int. clock Internal non-clk 20 0 ML C2MO HLFF FF AFF M-AFF In M Latches, internal nodes change only when input changes AFF, M-AFF: very small clock load, small 2 nd stage latch Most energy consumed in HLFF, FF with pulse generator and high internal switching activity CE energy breakdown (0.18 μm, 50% activity, high load) Nov. 14, 2003 19 Flip-Flops and M Latches: Energy Comparisons (0.25 μm, light load) (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 20 10
Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 21 Gated Transmission Gate M Latch Concept: inhibit clock switching when new = comp= XNOR If comp=0 ( ), circuit works as ML If comp=1 (=), =0, 1=1 latches closed, no output change, no internal power M 1 comp 1 M 0.5 * 0.5 0.5 0.5 * 1 comp 0.5 0.5 Gated ML (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 22 11
Gated TG M Latch: Timing and Energy etup time (U) and Hold time (H) comparison with ML Energy comparison with ML Increased etup time in gated ML due to inclusion of the comparator into the critical path slower than conventional ML maller energy per transition if switching activity of is <0.3 For higher switching activity, comparator and clock generator dominate the energy consumption Nov. 14, 2003 23 ata-transition look-ahead latch Pulse Generator M I Clock Control Nov. 14, 2003 24 P 1 ata-transition Look-Ahead Pulsed latch in which the generation of clock pulses are gated with XOR TLA circuit If P 1 =0, circuit operates as a conventional pulsed latch If = P 1 =1 =0, no output change or energy consumption in the latch XOR circuit and Clock Control in the critical path large setup time and - delay (Nogawa and Ohtomo 1998), Copyright 1998 IEEE 12
TLA-L: Analysis of Energy Consumption M Pulse Generator I C in (CC) TLA-L without clock gating TLA-L Pulse Generator α 1 α E = ( E + E ) + E + E CML 01 10 Cin 2 2 E L FF α = ( E 2 0 1 + E 1 0 1 α ) + E 2 idle = E idle + ECLK + EL CC + E E E 0 1 + int ext 1 0 = E idle + ECLK + EG + Eint E + E N PG E idle = E0 0 = E1 1 = + E Cin α input switching activity Pulse generator shared among N TLA-L s Nov. 14, 2003 25 Energy comparison of TLA-L and CML E(TLA-L) < E(CML) E(TLA-L) > E(CML) TLA-L is more energy-efficient than CML when N>2 and α< 0.25 Nov. 14, 2003 26 13
Clock-on-demand PL Pulsed latch in which the generation of clock pulses are gated with XNOR TLA circuit If XNOR=0, 1 when, and 0 after has changed to If = XNOR=1 =0, no output change or energy consumption in the latch Pulse Generator includes clock control can not be shared among ata-transition Look-Ahead Pulse Generator multiple PL s (Hamada et al. 1999), Copyright 1999 IEEE XNOR Nov. 14, 2003 27 Energy-Efficient Pulse Generator in CO-PL XNOR "1 " "0 " C "1 int " "1" traightforward implementation with CMO gates C int switches in each cycle Energy-inefficient XNOR "0 " inv Compound AN- NOR gate Energy-efficient inv XNOR "1" Compound AN-NOR Nov. 14, 2003 28 14
Impact of circuit sizing on the energy efficiency of CO-PL CO-PL more effective in high-speed sizing due to large clock transistors (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 29 Conditional capture flip-flop First stage: pulse generator with internal clock gating When =1, =R=1 When =1, 1 =0, can switch low if =1, =0, R can switch low if =0, =1 Otherwise, =R=1 no energy consumption econd stage: pass-gate implementation of M-AFF latch (Oklobdzija, tojanovic) No setup time degradation due to clock gating * R R N N 1 R R * (Kong et al. 2000), Copyright 2000 IEEE Nov. 14, 2003 30 15
Comparison of latches and flip-flops with local clock gating: Timing elay comparison: Internal race immunity comparison: - elay relatively constant with supply - Generally R(FF)< R(ML)< R(gated ML) voltage - CO-PL has low race immunity due to - Latches with clock gating have very wide clock pulse large delay due to large setup time (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 31 Comparison of latches and flip-flops with local clock gating: Energy, EP Energy comparison: - Latches with gated clock consume less energy than ML if α < 0.2 0.3 Energy-elay Product comparison: α < 0.03 G-ML best - 0.03 < α < 0.23 TLA-L best - 0.23 < α Conventional ML best (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 32 16
Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 33 N-only clocked latches M M d 1 N-PL (c) N 1 N 2 N-ML (a) N-PPL (d) N-FF (b) Pulse Generator (b)-(d) Concept: Bring clock only to n-mo transistors to allow reduced clock swing without conflict with partially turned-off p-mo transistors Reduced clock swing reduces clocking energy with some penalty in performance Clock is always in critical path as its edge signalizes when to change the output (a) conventional TG ML, (b) pulsed-latch, (c) conventional PL, (d) push-pull PL Nov. 14, 2003 34 17
Low clock swing CE s comparison: energy and delay Full-swing: PL preferred for high-speed ML preferred for low energy Low-swing clock: N-FF preferred for high-speed N-PPL is preferred for low energy Energy / cycle (norm) Energy / cycle (norm) 1.2 1.0 0.8 0.6 0.4 0.2 0.00 0.8 0.6 0.4 0.2 0 N-PPL N-ML 0 1 2 3 4 5 6 ata-to- delay (FO3 inverter delay) Nov. 14, 2003 35 PL N-PPL N-FF N-FF ML N-PL High-Vdd Low-wing 130nm technology, 50fF load, max. input cap=12.5ff, data activity=0.1: (a) high-v dd and (b) low-swing Effect of clock noise on low-swing clock latch delay - delay degradation 20% 16% 12% 8% 4% 0% N-CL N-PPL N-FF 0% 3% 6% 9% 12% Noise on low-swing clock All latches fail for clock noise > 12% of clock voltage N-FF gives best clock noise rejection Nov. 14, 2003 36 18
Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 37 ET Latch-mux circuit (ET-LM) Pass-gate latches: One transparent when =0 One transparent when =1 Pass-gate multiplexer that selects the output of the opaque latch (Llopis and achdev 1996), Copyright 1996 IEEE Nov. 14, 2003 38 19
C 2 MO Latch-mux (C 2 MO-LM) N4 N2 C 2 MO latches: One transparent when =1 One transparent when =0 Multiplexer: two C 2 MO inverters that propagate the output of the opaque latch Large clock transistors shared between the latches and the multiplexer N5 N2 N3 N3 N4 N5 (Gago et al. 1993), Copyright 1993 IEEE Nov. 14, 2003 39 Pulsed-latch (ET-PL) 2 2 1 1 1 1 1 1 2 2 2 1 2 1 (a) Pulse generator transparent to only when = 1 =1, or when = 2 =1 shortly after both edges of the clock ET PL consumes lot of energy for four clocked pass gates To improve speed, modified from original design (trollo et al, 1999) which implemented n-mo-only pass gate and p-mo-only keeper (a) single - edge, (b) dual - edge triggered Nov. 14, 2003 40 (b) 20
ET ymmetric pulse generator flip-flop (PGFF) Two pulse generators: X active at rising edge of the clock, Y active at falling edge of the clock X and Y alternately precharge and evaluate At any moment, one of X and Y keeps the value of data sampled at the most recent clock edge The other X or Y is precharged high 1 1 st TA GE: X 1 st TA GE: Y 2 nd TA GE Nov. 14, 2003 41 X 1 1 2 Pulses at X and Y have same width as clock econd stage is a simple NAN gate no need for a latch Y 2 ET vs. ET: elay comparison 6 5 E E elay [FO4] 4 3 2 1 0 ML/LM C2MO-LM PL PGFF Latch-MUX s have two equally critical paths, somewhat shorter than that of ML PL is more complex, adding more capacitance to the critical path compared to ET PL PGFF has short domino-like critical path fastest Nov. 14, 2003 42 21
ET vs. ET: Power consumption comparison LM s benefit from clever implementation of latch-mux structure with clock transistors sharing PL adds extra highactivity capacitance compared to ET PL PGFF power consumption is in the middle, mainly due to alternate switching of nodes X and Y Power [uw] 180 160 140 120 100 80 60 40 20 0 Non-clk M L LM PL E PL E C2M O E C2M O E PGFF (0.18 μm, 500MHz for ET, 250MHZ for ET, high load) Nov. 14, 2003 43 EP [fj/500mhz], [fj/250mhz] 60 50 40 30 20 10 0 ET vs. ET: EP comparison ingle Edge ouble Edge M L/LM C2M O PL PGFF Latch-MUX s have similar or better EP than their ET counterparts PL exhibits worse delay and energy compared to ET PL, due to more complex design PGFF is fastest with moderate energy consumption: lowest EP EP (PGFF) < EP (LM) < EP (PL) (0.18 μm, 500MHz for ET, 250MHZ for ET high load) Nov. 14, 2003 44 22