EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 21: Asynchronous Design Synchronization Clock Distribution Self-Timed Pipelined Datapath Req Ack HS Req Ack HS Req Ack HS Req Ack Start Done Start Done Start Done In R1 F1 R2 F2 R3 F3 Out t pf1 t pf2 t pf3 2 1
Hand-Shaking Protocol Req Ack Req 2 SENDER Data RECEIVER Ack 3 (a) Sender-receiver configuration Data 1 1 Two Phase Handshake cycle 1 cycle 2 Sender s action Receiver s action (b) Timing diagram 3 Event Logic The Muller-C Element A B C F A B F n+1 0 0 1 1 0 1 0 1 0 F n F n 1 (a) Schematic (b) Truth table V DD V DD V DD A B S R Q F A B B B F A F (a) Logic A B B (b) Majority Function (c) Dynamic 4 2
2-Phase Handshake Protocol Sender logic Data ready Data Receiver logic Data accepted C Req Ack Handshake logic Advantage : FAST - minimal # of signaling events (important for global interconnect) Disadvantage : edge - sensitive, has state 5 Example: Self-timed FIFO In R1 R2 R3 Out En Done Req i C C C Req 0 Ack i Ack o All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full 6 3
2-Phase Protocol 7 Example From [Horowitz] 8 4
Example 9 Example 10 5
Example 11 4-Phase Handshake Protocol Req 2 4 Sender s action Receiver s action Ack 3 5 Data 1 1 Cycle 1 Cycle 2 Also known as RTZ Slower, but unambiguous 12 6
4-Phase Handshake Protocol Implementation using Muller-C elements Sender logic Data Receiver logic Data ready Data accepted C C S Req Ack Handshake logic 13 Self-Resetting Logic Precharged Logic Block (L1) completion detection (L1) Precharged Logic Block (L2) completion detection (L2) Precharged Logic Block (L3) completion detection (L3) V DD int out Post-charge logic A B C 14 7
Asynchronous-Synchronous Interface Asynchronous system f in Synchronous system f CLK Synchronization 15 Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock φ as one of the inputs Problem: Circuit HAS to make a decision in limited time - which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay 16 8
A Simple Synchronizer CLK D int I 1 Q CLK I 2 Data sampled on rising edge of the clock Latch will eventually resolve the signal value, but... this might take infinite time! 17 Synchronizer: Output Trajectories 2.0 V out 1.0 0.0 0 100 200 300 time [ps] Single-pole model for a flip-flop 18 9
Mean Time to Failure 19 Example T f = 10 nsec = T T signal = 50 nsec t r = 1 nsec t = 310 psec V IH - V IL = 1 V (V DD = 5 V) N(T) = 3.9 10-9 errors/sec MTF (T) = 2.6 10 8 sec = 8.3 years MTF (0) = 2.5 µsec 20 10
Influence of Noise p(v) Uniform distribution around VM T logarithmic reduction 0 V IL V IH Initial Distribution Still Uniform Low amplitude noise does not influence synchronization behavior 21 Typical Synchronizers 2 phase clocking circuit Q φ2 φ1 Q φ2 φ1 Using delay line 22 11
Cascaded Synchronizers Reduce MTF In O 1 O 2 Out Sync Sync Sync φ 23 Arbiters Req1 Req2 Arbiter Ack1 Ack2 Req1 A B Ack2 (a) Schematic symbol Req2 Ack1 Req1 (b) Implementation Req2 V A T gap B metastable Ack1 t (c) Timing diagram 24 12
PLL-Based Synchronization Digital System Chip 1 Data Chip 2 Digital System f system = N x f crystal PLL Divider reference clock PLL Clock Buffer Crystal Oscillator f crystal, 200<Mhz 25 Clock Distribution Goal: Minimization of uncertainty Clock skew (spatial uncertainty) Systematic Clock jitter (temporal uncertainty) Random cycle-to-cycle changes 26 13
Reading Chapter 13, (Chandrakasan et al), Clock Distribution by Bailey Chapter 12, (Chandrakasan et al), PLLs and DLLs by Maneatis Chapter 10, Rabaey et al. 27 Clock Distribution Tree Common, e.g. IBM S/390 Clock grid» DEC Alpha Length-matched Serpentines» Intel P6 28 14
Clock Distribution CLOCK Example: PowerPC 603 Gerosa, JSSC 12/94 H-Tree Network Observe: Only Relative Skew is Important 29 Clock Network with Distributed Buffering Local Area Module Module secondary clock drivers Module Module Module Module main clock driver CLOCK Reduces absolute delay, and makes Power-Down easier Sensitive to variations in Buffer Delay 30 15
Predriver Binary tree H - tree X - tree Arbitrary matched tree 31 Example IBM S/390 Clock skew Webb, JSSC 11/97 32 16
Clock Tree Delays Restle, VLSI 98 33 Impact of clock network sizing 34 17
Impact of clock network sizing 35 Final Stage: Tree vs. Grid RC-matched Tree Grid Courtesy of IEEE Press, New York. 2000 36 18
IR Emission Images Central buffer Clock repeaters Sector buffers Local clocks Sanda, ISSCC 99 37 Example: DEC Alpha 21164 Clock Drivers 38 19
Clock Skew in Alpha Processor 39 DEC Alpha Evolution Clock driver placements 21064 21164 21164 Gronowski, JSSC 5/98 40 20
Clock Skews 21064 21164 21264 41 Hybrid Grid DEC Alpha 21264 Bailey JSSC 11/98 42 21
Alpha 21264 43 Alpha 21264 Grids Global clock Major clock grids 44 22
Data-Dependent Gate Loading 45 Multi-GHz Clock Networks Phillip Restle, IBM Research IEEE SSCTC Workshop on Design for Multi-GigaHertz Processors, San Fransico, Feb. 7, 2000 http://www.research.ibm.com/people/r/restle/mghz.html http://www.research.ibm.com/people/r/restle/animations/dac01top.html 46 23
Clock Generation Delay-Locked Loop (Delay Line Based) f REF Phase Det U D Charge Pump Filter DL f O Phase-Locked Loop (VCO-Based) f REF U N PD D CP VCO Filter f O 47 Phase-Locked Loop Based Clock Generator Up Down Reference clock Phase detector Up Charge pump Loop filter V contr VCO Local clock Down Clock decode & buffer Divide by N φ 1 φ 2... Acts also as Clock Multiplier 48 24
Loop Components Phase Comparator Produces UP/DN pulses corresponding to phase difference Charge Pump Sources/sinks current for duration of UP/DN pulses Loop Filter Integrates current to produce control voltage Voltage-Controlled Delay Line Changes delay proportionally to voltage Voltage-Controlled Oscillator Generates frequency proportional to control voltage 49 PLL Jitter 50 25
DLL Locking Courtesy of IEEE Press, New York. 2000 51 Clock Deskewing Two clock spines, two DLLs, and a PD that controls them Geannopoulos, ISSCC 98 52 26
Clock Ring Clocks routed in parallel, opposite directions LCG aligns to the middle Shibayama, ISSCC 98 53 Synchronous Distributed Oscillators VCOs # of nearest neighbors Mizuno, ISSCC 98 54 27
Distributed PLLs Gutnik, ISSCC 2000 55 Intel Itanium TM Rusu, ISSCC 2000 56 28
Intel Itanium TM 57 29