Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Chapter 9: Microprocessor Examples Wiley-Interscience and IEEE Press, January 2003
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan Design Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 2
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan Design Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 3
Intel Microprocessor Features Pentium II Pentium III Pentium 4 MPR Issue June 1997 April 2000 Dec 2001 Clock Speed 266 MHz 1GHz 2GHz Pipeline Stages 12/14 12/14 22/24 Transistors 7.5M 24M 42M Cache (I/D/L2) 16k/16K/- 16K/16K/256K 12K/8K/256K Die Size 203mm 2 106mm 2 217mm 2 IC Process 0.28µm, 4M 0.18µm, 6M 0.18µm, 6M Max Power 27W 23W 67W Source: Microprocessor Report Journal Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 4
IA-32 Pentium Pro Ext Clk FB Clk CLK Gen Delay Line Delay SR Deskew Control Delay Line Delay SR Left Spine Core PD Right Spine Clock distribution network with deskewing circuit (Geannopoulos and Dai 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 5
IA-32 Pentium Pro In Load<1:15,2> Delay Line Load<0:14,2> Out <1:15,2> <0:14,2> Delay Shift Register Delay shift register (Geannopoulos and Dai 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 6
IA-32 Pentium Pro Right Clk Bandwidth Control Delay = n Phase Detector 1 Left Leads Left Clk Delay = n Phase Detector 2 Right Leads Phase detector (Geannopoulos and Dai 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 7
First IA-64 Microprocessor PLL RCDs PLL Core Clock Ref erence Clock Deskew Cluster Clock distribution topology (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 8
First IA-64 Microprocessor Global Clock TAP Interf ace Reference Clock Phase Detector Deskew Buffer Digital Filter Control FSM Deskew Settings RCD Regional Clock Grid RCD Regional Feedback Clock Deskew buffer architecture (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 9
First IA-64 Microprocessor Input Output Enable Delay Control Register Digitally controlled delay line (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 10
First IA-64 Microprocessor Simulated regional clock-grid skew (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 11
First IA-64 Microprocessor Measured regional clock skew (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 12
Pentium 4 1x-Clk enable clock enable distribution & sync clock enable generator clock enable distribution & sync 2x-Clk enables addr. bus outbound clocks MACRO MACRO bus clock bus clock# Core PLL I/O PLL core Clk distribution I/O data Clk distribution core clock data bus outbound clocks core clock I/O feedback clock divide by 4 data from core data clock outbound deskew state machine D MSFF Q data data to core Q D inbound buffers input buffer MSFF core clock inbound latching clocks inbound clocks gen state machine strobe glitch protection and detection input buffers strobes Core and I/O clock generation (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 13
Pentium 4 To Test Access Port PLL 3 3-stage binary tree of clock repeaters Domain Buffer 1 Domain Buffer 2 Domain Buffer 3 Phase Detector Phase Detector Local Clock Macro Local Clock Macro Local Clock Macro Sequential Elements Sequential Elements Sequential Elements Domain Buffer 46 Domain Buffer 47 Phase Detector Phase Detector Local Clock Macro Local Clock Macro Sequential Elements Sequential Elements Logical diagram of core clock distribution (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 14
Stretch 1 Stretch 0 Adjustable Delay Buffer Pentium 4 Stretch 1 Stretch 0 Enable 1 Enable 2 Gclk Stretch 1 Stretch 0 Enable 1 Enable 2 Gclk ClkBuf Ty pe 1 medium f req. pulse clk phase 2 Enable 1 Enable 2 Gclk medium freq. pulse clk phase 1 Stretch 1 Stretch 1 Stretch 0 Enable 1 Stretch 0 Enable 1 SlowClkSy nc Enable 2 Gclk Gclk ClkBuf Ty pe 1 slow freq. pulse clk phase 1 ClkBuf Type 1 Enable Gclk ClkBuf Type 3 medium freq. normal clk phase 1 Stretch 1 Stretch 0 Enable 1 Adjustable Delay Buffer fast f req. pulse clk Enable 2 Gclk ClkBuf Type 2 Example of local clock buffers generating various frequency, phase and types of clocks (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 15
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan Design Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 16
UltraSPARC Family Characteristics UltraSPARC-I UltraSPARC-II UltraSPARC-III Year 1995 1997 2000 Architecture SPARC V9, 4-issue SPARC V9, 4-issue SPARC V9, 4-issue Die size 17.7x17.8mm 2 12.5x12.5mm 2 15x15.5mm 2 # of transistors 5.2M 5.4M 23M Clock Frequency 167MHz 330MHz 1GHz Supply voltage 3.3V 2.5V 1.6V Process 0.5µm CMOS 0.35µm CMOS 0.15µm CMOS Metal layers 4 (Al) 5 (Al) 7 (Al) Power consumption <30W <30W <80W Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 17
UltraSPARC-III : Clocking Clock distribution delay in UltraSPARC-III (Heald et al. 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 18
UltraSPARC-III: Clock Storage Elements Vdd Vdd M P1 M N3 NAND S Clk 1 Inv 4 M N5 M P2 Q Inv 5 Q D M N2 Inv 2 Inv 3 Inv 6 Inv 1 Clk M N1 M N4 Semidynamic flip-flop (Klass 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 19
UltraSPARC-III: Clock Storage Elements Vdd D 1 M P1 M N3 Vdd S NAND Inv 4 M P2 Vdd Q Inv 5 Q D 1 M P1 M N3 M N2a NAND M N2c D 1 S Inv 4 M N5 M P2 Vdd Inv Q 5 Inv 3 Inv 6 Q D 2 NMOS network Inv 3 M N5 Inv 6 D 2 M N2b M N2d D 2 M N4 D N M N4 Clk M N1 Clk M N1 Inv 1 Inv 2 Inv 1 Inv 2 a) b) (a) Logic embedding in a semidynamic flip-flop; (b) Two-input XOR function. (Klass, 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 20
UltraSPARC-III: Clock Storage Elements Vdd Vdd Vdd M P1 M P1 M P2 M P4 M P3 Inv 5 S Inv 5 Q Q S R Inv 6 Q Inv 4 M N3 NAND M N6 M N3 Inv 1-2 Inv 3-4 M N5 M N7 D M N2 Inv 3 D M N2 M N4 D Clk Clk M N1 Inv 1 Inv 2 M N1 a) b) Dynamic versions of semidynamic flip-flop: (a) single-ended; (b) Differential. (Klass 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 21
UltraSPARC-III: Clock Storage Elements Vdd Vdd Vdd M P3 D Vdd M P1 M P4 Vdd M P6 M P2 S M P5 M P7 Inv 5 D M N3 M N2 NAND M N4 Inv 2 Inv 3 M N6 Q M N7 Inv 4 Q Inv 1 Clk M N1 M N5 UltraSPARC-III flip-flop (Heald et al. 2000), Copyright 2000 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 22
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan Design Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 23
Alpha Microprocessor Features 21064 21164 21264 21364 # transistors [M] 1.68 9.3 15.2 152 Die Size [mm 2 ] 16.8x13.9 18.1x16.5 16.7x18.8 21.1x18.8 Process 0.75µm 0.5µm 0.35µm 0.18µm Supply [V] 3.3 3.3 2.2 1.5 Power [W] 30 50 72 125 Clk Freq. [MHz] 200 300 600 1200 Gates/Cycle 16 14 12 12 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 24
Alpha Microprocessors: Clocking clock grid (a) (b) (c) Alpha microprocessor final clock driver location: (a) 21064, (b) 21164, (c) 21264 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 25
Alpha Microprocessors: Clocking 21064 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 26
Alpha Microprocessors: Clocking 21164 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 27
Alpha Microprocessors: Clocking D D Clk Clk D ext. clk PLL GCLK Grid local clk Box Clk Grid Clk D Clk local clk D D Clk cond cond. local clk Clk cond cond. local clk 21264 clock hierarchy (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 28
Alpha Microprocessors: Clocking 21264 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 29
Alpha Microprocessors: Clocking NCLK DLL DLL DLL GCLK grid L2LClk L2RClk 21364 major clock domains (Xanthopoulos et al. 2001), Copyright 2001 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 30
Alpha Microprocessors: Clocking 21364, NCLK clock skew (Xanthopoulos et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 31
Alpha µp: Clock Storage Elements P 1 D Clk P 1 N 3 P 5 X P 2 N 4 Q D Clk P 3 X P 2 P 4 N1 N2 Q N1 N2 N 5 21064 modified TSPC latches (Gronowski et al. 1998), Copyright 1998 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 32
Alpha µp: Clock Storage Elements D X Q D X Q Clk Clk (a) (b) 21164: (a) phase-a latch, (b) phase-b latch (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 33
Alpha µp: Clock Storage Elements D 1 D 1 D 2 X1 D 2 X Q Clk Q Clk D 3 D 4 X2 Clk (a) Embedding of logic into a latch: (a) 21064 TSPC latch, one level of logic; (b) 21164 latch, two levels of logic. (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 34 (b)
Alpha µp: Clock Storage Elements Q Q Clk D 21264 flip-flop (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 35
Alpha Microprocessors: Timing D Q Logic D Q D Q Logic D R D R GCLK Critical Path Definition and Criteria - Identify common clock, D and R - Maximize D - Minimize R D+ U R T cycle GCLK Race Definition and Criteria - Identify common clock, D and R - Minimize D - Maximize R D R+ H cond Critical-path and race analysis for clock buffering and conditioning (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 36
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan Design Examples of CSEs Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 37
Hazard-Free Level-Sensitive Polarity-Hold Latch +Clock Data Out -Clock Eichelberger 1983 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 38
General LSSD Configuration Inputs (X) Combinational Logic Outputs (Y) Y=Y(X, S n ) Clocked Storage Elements Scan-Out Clock Present State Next State S S Scan-Out n+1 n S n+1 = f {S n, X} Scan-In Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 39
LSSD Shift Register Latch -Scan_In -Data +A Clk -C Clk L 1 Latch -L 1 +L 1 L 2 Latch -L 2 +L 2 +B Clk Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 40
LSSD Double Latch Design State S n Primary Outputs Z X 1 L1 L2 X 2 L1 L2 Primary Inputs X Combinational Logic X 3 L1 L2 S n X n L1 L2 C 1 A Shift Scan In B Shift or Scan In Scan Out Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 41
IBM S/390 Parallel Server Processor CLKG B_CLK A_CLK CLKL SCAN_IN L1 L2 CLK_ENABLE CLKG SELECT_N IN_A IN_B Q (SCAN_OUT) SELECT_A CLKL TEST_DISABLE LSSD SRL with multiplexer used in the IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 42
IBM S/390 Parallel Server Processor B_CLK A_CLK SCAN_IN Q IN_A IN_B IN_C IN_M IN_N mux_a mux_m_n Q (SCAN_OUT) SELECT_N CLKL SELECT_A TEST_DISABLE Static multiplexer version of the SRL used in the IBM S/390 G4 (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 43
IBM S/390 Parallel Server Processor CLKG C1 A_CLK SCAN_IN IN L1 L2 Q (SCAN_OUT) C2 B_CLK CLKG C2_ENABLE C2 C1_DISABLE C1 A clocked storage element is used in the non-timing-critical timing macros of the IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 44
IBM S/390 Parallel Server Processor CLKG C1 B_CLK CLKG C2_ENABLE UNOVERLAP C2 C2 C1_DISABLE C1 The clock-generation element used to detect problems created with fast paths: IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 45
IBM PowerPC Processor SCAN_GATE SG SEL_EXT i NCLK (a) SEL i CLK OT SEL 0 SEL n-1 CLK SO D 0 D n-1 CLK True Mux CLK Slave Latch OC SEL 0 SEL n-1 SR Master Latch Complement Mux The experimental IBM PowerPC processor (Silberman et al. 1998), reproduced by permission Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 46 (b)
IBM PowerPC 603: Master-Slave Latch ACLK V DD SCAN in C 2 ACLK D in C 1 C 2 D out C 1 C 2 ACLK The PowerPC 603 MSL (Gerosa et al. 1994), Copyright 1994 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 47
IBM PowerPC 603: Local Clk Generator C1_FREEZE C1_TEST SCAN_C1 GCLK ACLK C1 WAITCLK OVERRIDE C2 C2_TEST C2_FREEZE The PowerPC 603 local clock regenerator (Gerosa et al. 1994), Copyright 1994 IEEE Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 48
Summary Intel Microprocessors Active clock deskewing in Pentium processors Sun Microsystems Processors Semidynamic flip-flop (one of the fastest single-ended flip-flops today, soft-edge ) Alpha Processors Performance leader in the 90s Incorporating logic into CSEs IBM Processors Design for testability techniques Low-power champion PowerPC 603 Nov. 14, 2003 Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 49