igital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, ejan M. Markovic, Nikola M. Nedovic Chapter 9: Microprocessor Examples Wiley-Interscience and IEEE Press, January 2003 Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan esign Examples of CSEs Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 2
Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan esign Examples of CSEs Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 3 Intel Microprocessor Features Pentium II Pentium III Pentium 4 MPR Issue June 1997 April 2000 ec 2001 Clock Speed 266 MHz 1GHz 2GHz Pipeline Stages 12/14 12/14 22/24 Transistors 7.5M 24M 42M Cache (I//L2) 16k/16K/- 16K/16K/256K 12K/8K/256K ie Size 203mm 2 106mm 2 217mm 2 IC Process 0.28µm, 4M 0.18µm, 6M 0.18µm, 6M Max Power 27W 23W 67W Source: Microprocessor Report Journal Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 4
IA-32 Pentium Pro Ext FB CLK Gen elay Line elay SR eskew Control elay Line elay SR Left Spine Core P Right Spine Clock distribution network with deskewing circuit (Geannopoulos and ai 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 5 Adaptive eskewing Technique Equalization of two clock distribution spines by compensating for delay mismatch elay lines Phase detector Controller Result: global clock skew of only 15ps 0.25µm technology 7.5M transistors Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 6
IA-32 Pentium Pro In elay Line Out Load<1:15,2> Load<0:14,2> <1:15,2> <0:14,2> elay Shift Register elay shift register (Geannopoulos and ai 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 7 IA-32 Pentium Pro Right Bandwidth Control elay = n Phase etector 1 Left Leads Left elay = n Phase etector 2 Right Leads Phase detector (Geannopoulos and ai 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 8
First IA-64 Microprocessor PLL RCs PLL Core Clock Reference Clock eskew Cluster Clock distribution topology (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 9 Programmable eskew Units Strategy similar to that in IA-32 External differential clock System bus frequency PLL generates internal clock 2x frequency Clock distribution architecture Balanced global clock tree Multiple deskew buffers Multiple local clock buffers Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 10
First IA-64 Microprocessor Global Clock TAP Interface Reference Clock Phase etector eskew Buffer igital Filter Control FSM eskew Settings RC Regional Clock Grid RC Regional Feedback Clock eskew buffer architecture (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 11 First IA-64 Microprocessor Input Output Enable elay Control Register igitally controlled delay line (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 12
First IA-64 Microprocessor Simulated regional clock-grid skew (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 13 First IA-64 Microprocessor Measured regional clock skew (Rusu and Tam 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 14
Pentium 4 1x- enable clock enable distribution & sync clock enable generator clock enable distribution & sync 2x- enables addr. bus outbound clocks MACRO MACRO bus clock bus clock# Core PLL I/O PLL core distribution I/O data distribution core clock data bus outbound clocks core clock I/O feedback clock divide by 4 data from core data clock outbound deskew state machine MSFF data data to core inbound buffers input buffer MSFF core clock inbound latching clocks inbound clocks gen state machine strobe glitch protection and detection input buffers strobes Core and I/O clock generation (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 15 Multi-GHz Clock Network in Pentium 4 Three core and three I/O frequencies (total 6 frequencies running concurrently) ifferential off-chip reference clock PLL synthesizes core and I/O clocks Global core clock distribution 47 independent clock domains Each domain has 5-bit deskew control register Clock skew < 20ps Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 16
Pentium 4 To Test Access Port PLL 3 3-stage binary tree of clock repeaters omain Buffer 1 omain Buffer 2 omain Buffer 3 Phase etector Phase etector Local Clock Macro Local Clock Macro Local Clock Macro Sequential Elements Sequential Elements Sequential Elements omain Buffer 46 omain Buffer 47 Phase etector Phase etector Local Clock Macro Local Clock Macro Sequential Elements Sequential Elements Logical diagram of core clock distribution (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 17 Stretch 1 Stretch 0 Enable 1 Enable 2 Gclk Adjustable elay Buffer Pentium 4 medium freq. pulse clk phase 1 Stretch 1 Stretch 0 Enable 1 Enable 2 Gclk Stretch 1 Stretch 0 Enable 1 Enable 2 Gclk Stretch 1 Stretch 1 Stretch 0 Stretch 0 Enable 1 Enable 1 SlowSync Enable 2 Gclk Gclk Buf Type 1 Buf Type 1 medium freq. pulse clk phase 2 slow freq. pulse clk phase 1 Buf Type 1 Enable Gclk Buf Type 3 medium freq. normal clk phase 1 Stretch 1 Stretch 0 Enable 1 Adjustable elay Buffer fast freq. pulse clk Enable 2 Gclk Buf Type 2 Example of local clock buffers generating various frequency, phase and types of clocks (Kurd et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 18
Intel Clocking: Summary Increasing clock speeds and die size Balancing the clock skew in large designs using simple RC trees is becoming less effective Insertion delay 7-8FO4 due to increased die Comparable to the clock period Clock skew control has been getting harder to due to increased PVT variations Inductive effects at multi-ghz rates Use of active deskewing circuits Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 19 Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan esign Examples of CSEs Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 20
UltraSPARC Family Characteristics UltraSPARC-I UltraSPARC-II UltraSPARC-III Year 1995 1997 2000 Architecture SPARC V9, 4-issue SPARC V9, 4-issue SPARC V9, 4-issue ie size 17.7x17.8mm 2 12.5x12.5mm 2 15x15.5mm 2 # of transistors 5.2M 5.4M 23M Clock Frequency 167MHz 330MHz 1GHz Supply voltage 3.3V 2.5V 1.6V Process 0.5µm CMOS 0.35µm CMOS 0.15µm CMOS Metal layers 4 (Al) 5 (Al) 7 (Al) Power consumption <30W <30W <80W Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 21 UltraSPARC-III: Clocking Performance-driven high-power clock distribution Eight logic gates per cycle High-speed semi-dynamic flip-flops with logic embedding Large hold time mandates use of advanced tools for fixing fast-path violations Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 22
UltraSPARC-III : Clocking Clock distribution delay in UltraSPARC-III (Heald et al. 2000), Copyright 2000 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 23 UltraSPARC-III: Clock Storage Elements M P1 M N3 NAN S 1 Inv 4 M N5 M P2 Inv 5 M N2 Inv 2 Inv3 Inv 6 Inv 1 M N1 M N4 Semidynamic flip-flop (Klass 1998), Copyright 1998 IEEE Single-ended dynamic structure with use of keepers for static operation and use of clock pulsing Positive feedback (NAN) improves low-to-high setup time Fast, at the price of high internal and clock power Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 24
UltraSPARC-III: Clock Storage Elements 1 M P1 M N3 S NAN Inv4 M P2 Inv 5 1 M P1 M N3 M N2a NAN M N2c 1 S M P2 Inv 5 Inv4 M Inv N5 Inv 6 3 2 NMOS network Inv 3 M N5 Inv 6 2 M N2b M N2d 2 M N4 N M N4 M N1 M N1 Inv 1 Inv 2 Inv 1 Inv 2 Logic embedding in a semi-dynamic flip-flop Two-input XOR function (Klass, 1998), Copyright 1998 IEEE A non-inverting logic function can be embedded by replacing the input transistor with an n-mos logic network Necessary for fitting 8 logic stages in cycle time, also used for scan Complexity of embedded logic limited by the n-mos stack depth Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 25 UltraSPARC-III: Clock Storage Elements M P1 M P1 M P2 M P4 M P3 S Inv 5 Inv 5 S R Inv 6 Inv 4 M N3 M N6 M N3 M N5 NAN Inv 1-2 Inv 3-4 M N7 M N2 Inv 3 M N2 M N4 M N1 Inv 1 Inv 2 M N1 Single-ended dynamic SFF ifferential dynamic SFF (Klass, 1998), Copyright 1998 IEEE ynamic version of SFF used in dynamic logic paths Outputs exercise precharge-evaluate sequence to ensure monotonicity Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 26
UltraSPARC-III: Clock Storage Elements M P3 M P1 M P4 M P6 M P2 M N3 NAN S M N6 M P5 M P7 M N7 Inv 5 Inv 4 M N2 Inv 2 M N4 Inv3 Inv 1 M N1 M N5 UltraSPARC-III flip-flop (Heald et al. 2000), Copyright 2000 IEEE Final UltraSPARC-III flip-flop modified by decoupling keepers to increase immunity to α-particles Somewhat degraded speed and logic embedding property Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 27 Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan esign Examples of CSEs Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 28
Alpha Microprocessor Features 21064 21164 21264 21364 # transistors [M] 1.68 9.3 15.2 152 ie Size [mm 2 ] 16.8x13.9 18.1x16.5 16.7x18.8 21.1x18.8 Process Supply [V] 0.75µm 3.3 0.5µm 3.3 0.35µm 2.2 0.18µm 1.5 Power [W] 30 50 72 125 Freq. [MHz] 200 300 600 1200 Gates/Cycle 16 14 12 12 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 29 Alpha Microprocessors: Clocking clock grid (a) (b) (c) Alpha microprocessor final clock driver location: (a) 21064, (b) 21164, (c) 21264 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 30
Alpha Microprocessors: Clocking 21064 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 31 Alpha Microprocessors: Clocking 21164 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 32
Alpha Microprocessors: Clocking ext. clk PLL GCLK Grid local clk Box Grid local clk cond cond. local clk cond cond. local clk 21264 clock hierarchy (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 33 Alpha Microprocessors: Clocking 21264 clock skew (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 34
Alpha Microprocessors: Clocking NCLK LL LL LL GCLK grid L2L L2R 21364 major clock domains (Xanthopoulos et al. 2001), Copyright 2001 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 35 Alpha Microprocessors: Clocking 21364, NCLK clock skew (Xanthopoulos et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 36
Alpha µp: Clock Storage Elements P 1 P 1 N 3 P 5 X P 2 N 4 P 3 X P 2 P 4 N1 N2 N1 N2 N 5 21064 modified TSPC latches (Gronowski et al. 1998), Copyright 1998 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 37 Alpha µp: Clock Storage Elements X X (a) (b) 21164: (a) phase-a latch, (b) phase-b latch (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 38
Alpha µp: Clock Storage Elements 1 1 2 X1 2 X 3 4 X2 (a) Embedding of logic into a latch: (a) 21064 TSPC latch, one level of logic; (b) 21164 latch, two levels of logic. (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 39 (b) Alpha µp: Clock Storage Elements 21264 flip-flop (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 40
Alpha Microprocessors: Timing Logic Logic R R GCLK Critical Path efinition and Criteria - Identify common clock, and R -Maximize - Minimize R +U R T cycle GCLK Race efinition and Criteria - Identify common clock, and R - Minimize -Maximize R R+H cond Critical-path and race analysis for clock buffering and conditioning (Gronowski et al. 1998), Copyright 1998 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 41 Microprocessor Examples Clocking for Intel Microprocessors IA-32 Pentium Pro First IA-64 Microprocessor Pentium 4 Sun Microsystems UltraSPARC-III Clocking Clocking and CSEs Alpha Clocking: A Historical Overview Clocking and CSEs IBM Microprocessors Level-Sensitive Scan esign Examples of CSEs Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 42
Hazard-Free Level-Sensitive Polarity-Hold Latch +Clock ata Out -Clock Eichelberger 1983 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 43 General LSS Configuration Inputs (X) Combinational Logic Outputs (Y) Y=Y(X, S n ) Clocked Storage Elements Scan-Out Clock Present State Next State S S Scan-Out n+1 n S n+1 = f {S n, X} Scan-In Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 44
LSS Shift Register Latch L 1 Latch -Scan_In -L 1 +L 1 L 2 Latch -ata -L 2 +A +L 2 -C +B Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 45 LSS ouble Latch esign State S n Primary Outputs Z X 1 L1 L2 X 2 L1 L2 Primary Inputs X Combinational Logic X 3 L1 L2 S n X n L1 L2 C 1 A Shift Scan In B Shift or Scan In Scan Out Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 46
IBM S/390 Parallel Server Processor CLKG B_CLK A_CLK CLKL SCAN_IN L1 L2 CLK_ENABLE CLKG SELECT_N IN_A IN_B (SCAN_OUT) SELECT_A CLKL TEST_ISABLE LSS SRL with multiplexer used in the IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 47 IBM S/390 Parallel Server Processor B_CLK A_CLK SCAN_IN IN_A IN_B IN_C IN_M IN_N mux_a mux_m_n (SCAN_OUT) SELECT_N CLKL SELECT_A TEST_ISABLE Static multiplexer version of the SRL used in the IBM S/390 G4 (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 48
IBM S/390 Parallel Server Processor CLKG C1 C2 A_CLK SCAN_IN IN L1 L2 (SCAN_OUT) B_CLK CLKG C2_ENABLE C2 C1_ISABLE C1 A clocked storage element is used in the non-timing-critical timing macros of the IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 49 IBM S/390 Parallel Server Processor CLKG C1 B_CLK CLKG C2_ENABLE UNOVERLAP C2 C2 C1_ISABLE C1 The clock-generation element used to detect problems created with fast paths: IBM S/390 G4 processor (Sigal et al. 1997), reproduced by permission Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 50
IBM PowerPC Processor SCAN_GATE SG SEL_EXT i NCLK (a) SEL i CLK OT SEL 0 SEL n-1 CLK SO 0 n-1 CLK True Mux CLK Slave Latch OC SEL 0 SEL n-1 SR Master Latch Complement Mux The experimental IBM PowerPC processor (Silberman et al. 1998), reproduced by permission Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 51 (b) IBM PowerPC 603: Master-Slave Latch ACLK V SCAN in C 2 ACLK in C 1 C 2 out C 1 C 2 ACLK The PowerPC 603 MSL (Gerosa et al. 1994), Copyright 1994 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 52
IBM PowerPC 603: Local Generator C1_FREEZE C1_TEST SCAN_C1 GCLK ACLK C1 WAITCLK OVERRIE C2 C2_TEST C2_FREEZE The PowerPC 603 local clock regenerator (Gerosa et al. 1994), Copyright 1994 IEEE Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 53 Summary Intel Microprocessors Active clock deskewing in Pentium processors Sun Microsystems Processors Semidynamic flip-flop (one of the fastest single-ended flip-flops today, soft-edge ) Alpha Processors Performance leader in the 90s Incorporating logic into CSEs IBM Processors esign for testability techniques Low-power champion PowerPC 603 Nov. 14, 2003 igital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic 54