EE141-Fall 2011 Digital Integrated Circuits Lecture 2 Clock, I/O Timing 1 4 Administrative Stuff Pipelining Project Phase 4 due on Monday, Nov. 21, 10am Homework 9 Due Thursday, December 1 Visit to Intel on November 29 a b Reference log a b log Pipelined 2 Last Lecture Latch-Based Clocking Last lecture Timing Today s lecture Clocks Reading (Ch. 10) In F C 1 C 2 G C 3 (Domino logic almost always uses latch-based clocking) Compute F compute G 3 6 1
Latch vs. Flip-flop In a flip-flop based system: Data launches on one rising edge And must arrive before next rising edge If data arrives late, system fails If it arrives early, wasting time Flip-flops have hard edges Clock Distribution Single clock generally used to synchronize all logic on the same chip Need to distribute clock over the entire die While maintaining low skew/jitter (And without burning too much power) In a latch-based system: Data can pass through latch while it is transparent Long cycle of logic can borrow time into next cycle As long as each loop finished in one cycle 7 10 Latch vs. Flip-flop Summary Flip-flops generally easier to use Most digital ASICs designed with register-based timing But, latches (both pulsed and level-sensitive) allow more flexibility And hence can potentially achieve higher performance Latches can also be made more tolerant of clock un-certainty More in EE241 Clock Distribution What s wrong with just routing wires to every point that needs a clock? 8 11 H-Tree Clock Distribution Equal wire length/number of buffers to get to every location 9 12 2
More realistic H-tree Clock s [Restle98] 13 16 Clock Grid G Clock Skew in Alpha Processor G G No RC matching But huge power G 14 17 Example: DEC Alpha 21164 (199) t rise = 0.3ns t cycle = 3.3ns Clock waveform final drivers pre-driver Location of clock driver on die t skew = 10ps 2 phase single wire clock, distributed globally 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.7nF clock load, 20W power 8 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation 1 EV6 (Alpha 21264) Clocking 600 MHz 0.3 micron CMOS t rise = 0.3ns Global clock waveform PLL t cycle = 1.67ns t skew = 0ps 2 Phase, with multiple conditional buffered clocks 2.8 nf clock load 40 cm final driver width Local clocks can be gated off to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking 18 3
21264 Clocking Clock Animations By Phillip Restle (IBM) http://www.research.ibm.com/people/r/restle/animations /DAC01top.html 19 22 ps 10 1 20 2 30 3 40 4 0 EV6 Clock Results ps 300 30 310 31 320 32 330 33 340 34 I/O Design G Skew (at Vdd/2 Crossings) G Rise Times (20% to 80% Extrapolated to 0% to 100%) 20 23 EV7 Clock Hierarchy (2002) Chip Packaging Active Skew Management and Multiple Clock Domains L2L_ (L2 Cache) N (Mem Ctrl) G (CPU Core) PLL L2R_ (L2 Cache) + widely dispersed drivers + s compensate static and lowfrequency variation + divides design and verification effort - design and verification is added work L Bonding wire L Chip Lead frame Pin Mounting cavity Bond wires (~2 m) are used to connect the package to the chip Pads are arranged in a frame around the chip Pads are relatively large (~100 m in 0.2 m technology), with large pitch (100 m) Many chips are pad limited SYS + tailored clocks 21 24 4
Pad Frame Layout Die Photo ESD Protection When a chip is connected to a board, there is unknown (potentially large) static voltage difference Equalizing potentials requires (large) charge flow through the pads Diodes sink this charge into the substrate need guard rings to pick it up. 2 28 Bonding Pad Design Pads + ESD Protection Bonding Pad GND V DD 100 m PAD R D1 D2 X C Diode V DD In GND 26 29 Chip Packaging Next Lecture An alternative is flipchip : Pads are distributed around the chip The solder balls are placed on pads The chip is flipped onto the package Pads still large But can have many more of them Power distribution Scaling 27 30