High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities
Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design Challenges and opportunities 2012 IBM Corporation
Agenda Different Eras Technology Era Multi core Era (Design Era) Innovation Era (EDA Era) Innovation Technology Innovation Productivity Innovation 2012 IBM Corporation
The Technology Era: Frequency Scaling Once upon a time, life used to be Great, when technology was the superman and Design tagged along for the ride and even EDA grabbed designer legs for the fun!
Characteristics of Single Thread Era Dennard Scaling Optical Scaling / Node Migration Exponential Frequency Growth Expanding uarch Complexity Frequency (GHz) 6 5 4 3 2 1 0 POWER4 POWER5 POWER6 TXs per core 500 400 300 200 100 0 POWER4 POWER5 POWER6
Single Thread Era EDA: Static timing analysis of complex circuits Transistor Analysis & Optimization Transistor Level timing optimization 1200 1000 # of paths 800 600 400 Pre-Tuning Post-Tuning clkin 200 2 nd Timing fbk = 0 w2 Cycle w3 fbk = 1 w3_int 0-6 -4-2 0 2 4 6 8 Slack (ps) evaluate 1 st Timing precharge w0 fbk w2 = 1
End of Frequency Scaling : The Power Wall 1000 Power Density (W/cm 2 ) 100 10 1 0.1 0.01 Active Power Air Cooling limit Passive Power 0.001 1 1994 2004 0.1 Gate Length (microns) 0.01 Inability to scale Oxide thickness & lower voltage resulted in a power wall for single thread performance
Frequency Scaling : POWER6 (65nm, 2007) 5+ GHz operation, >790M transistors, 341mm 2 die 65nm SOI with 10 levels of Cu interconnect Same pipeline depth & power @ 2x frequency versus POWER5 2 MB L2 Mem. Cntl. IFU / IDU LSU L2 Dir L2 Dir F X U RU B F U SMP Fabric D F U V M X L 3 C O N T R O L L E R 2 MB L2 Mem. Cntl. 2 MB L2 Core 1 2 MB L2
Technology Tantrums Technology Design End Designers of Frequency Scaling with Technology Squeezing the design hard Shock and awe of 65nm: Wire delays overtaking Gate delays
Multi-Core Era Multi-Core End of frequency scaling ushered in a new era of innovation with multi-core design
POWER Processors Began the Multi-Core / Multi-Thread Era Power 4 2001 Introduced First Dual core Power 5 2004 Dual Core Introduces SMT (4 threads) Power 6 2007 Dual Core 4 threads Enhances SMT Efficiency
Life starts to become interesting: Technology ride very bumpy Gain by Traditional Scaling Gain by Innovation Relative % Improvement 100% 80% 60% 40% 20% 0% 180nm 130nm 90nm 65nm 45nm 32nm 3fF BL (32 Cells) 4.0um Node WL BOX BL Passing WL Node W L Deep Trench Cap 18fF Storage Node High-K Metal Gate
Multi-Core Era Limiters 100 SW parallelism Socket BW 64 log (performance) 10 Technology complexity & rising costs Power 4 8 16 32 Ideal Growth Likely Multi- Core Path 2 1 1 90 65 45 32 22 14 10 Technology Node
Multi-Core Advantage Need to Amplify Effective Socket Throughput To Achieve Potential Compute Throughput Potential Socket Throughput Limitation (Power, memory bandwidth)
Innovation Drive Architecture & Productivity Innovation
High performance up Designs: Extending Multi-Core Gains (Power processor) Compute Throughput Potential Coherence Innovation to minimize socket-to-socket communication Low-Power Off-Chip Signaling Technology High bandwidth memory buffer EDRAM = large, low power cache Socket Throughput Limitation (Power, memory bandwidth)
Innovation Drive : System Level Technologies 3D Stacking with Through Silicon Vias Silicon Photonics Single Processor Memory Socket FPGA Accelerators Heterogeneous systems on Chip Specialized functions Specialized cores: Single thread focused Throughput focused Flash Memory / SSD
Innovation at Technology, Design Interface: Double/Triple Patterning Pitch (nm) 150 100 50 Need for Double / Triple Patterning EUV? Device Pitch Single Exposure Limit Metal Pitch Double Patterning Limit 0 32nm 20nm Future Technology Node
Productivity Innovation: Structured Synthesis and Large Block Synthesis Customs take large amount of resources and productivity is key Merge the domain of customs and Synthesis targeting design productivity and improved quality through merging of custom and synthesis hierarchy with structure in synthesis (not random logic any more) Global Optimization view; Targeted structured data paths and synthesis A methodology with numerous algorithmic and practical innovations spanning from incremental logic design processing, to data paths to structured clocking to custom synthesis merged techniques. P/Z server Macro Quad FPU
Productivity Innovation : Reduce Custom Design (Structured Synthesis) # of Customs over Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 >10x reduction over 5 generation Synthesis results w/ custom-like data flow alignment.
Productivity Innovation: Reduced # of Design Partitions (Large Block Synthesis) # of Macros over Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 60 logic macros, 25 customs, 14 unique arrays/rfs 1 macro, 0 customs, 9 unique arrays / RFs Reduced area & power; equal cycle time
Productivity & TAT Innovation: Gate Level Analysis & Signoff Large speedup 1 Similar accuracy Arbitrary Units Reduced cleanup TX Level Gate Level 0 Runtime Cleanup Work Accuracy
Productivity & TAT Innovation: Hierarchical Abstraction & Multi-Threading 50 Projected Chip Timing Runtime hrs. 0 Base Cleanup Coarse Hierarchical Parallelism abstracts Multithreading Fast global analysis tools allow designers to iterate more often resulting in improved final designs. Hierarchical abstraction & multi-threading are the most promising ways to minimize TAT. Applies to all disciplines (timing, verification, etc)
Productivity Innovation: Retiming Area/Power too high... latch Optimal Doesn t meet cycle time Significant fraction of logic designer effort spent in optimizing cycle boundaries Retiming enables physical synthesis to optimally place latches in logic cones to balance timing/area/power Invention is required to seamlessly handle divergence between functional RTL (Verilog/VHDL) and physical implementation throughout methodology.
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Designer Time Innovation: The sweet spot in this new era Wait for Tools Implement Plan Innovate 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Designer Time Wait for Tools Implement Plan Innovate 25
Hardware Programming Millions of Software designers HLL: C/C++, LiMe, OpenCL 1000s of RTL designers HL Compiler VHDL / Verilog VHDL / Verilog Synthesis Place & Route Synthesis Place & Route Hardware LUT FF RAM LUT FF RAM Hardware Traditional High-level
Architectural Synthesis Successive Refinement Functional Cycle Accurate RTL: VHDL/Verilog C/C++ Model C/C++ Model Back End Design Implementation and Analysis Metric: Cache Miss rate etc. Metrics: Performance Models, CPI etc Metrics: Electrical, Timing, Area, Noise etc.
Summary Information technology landscape is changing dramatically Value is in innovating across the entire stack and increasingly higher up in the stack Key problems remain to be solved in technology, design and automation as technology continues to scale Significant emerging opportunities in new ways to solve system bottlenecks at every levels: Logic, Architecture, Memory. In last several years, life became very challenging but also very interesting as the ride has gotten a lot choppier With challenges and opportunities abound, organizations that grab these challenge and innovate their way out of the current dilemmas will be the winners. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Designer Time Wait for Tools Implement Plan Innovate 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Designer Time Wait for Tools Implement Plan Innovate IP Design content creation innovation IP Design Process Innovation Design Implementation Innovation System value moving up the stack