Implementation Challenges and Solutions of Low-Power, High-Performance Memory Systems

Similar documents
Sharif University of Technology. SoC: Introduction

Future of Analog Design and Upcoming Challenges in Nanometer CMOS

Smart. Connected. Energy-Friendly.

Versatile IO Circuit Schemes for LPDDR4 with 1.8mW/Gbps/pin Power Efficiency. Kyoung-Hoi Koo

Innovative Fast Timing Design

Benchtop Portability with ATE Performance

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

from ocean to cloud ADAPTING THE C&A PROCESS FOR COHERENT TECHNOLOGY

Performance Modeling and Noise Reduction in VLSI Packaging

Wafer Thinning and Thru-Silicon Vias

Optimizing BNC PCB Footprint Designs for Digital Video Equipment

System Quality Indicators

Spring Probes and Probe Cards for Wafer-Level Test. Jim Brandes Multitest. A Comparison of Probe Solutions for an RF WLCSP Product

Circuits Assembly September 1, 2003 Duck, Allen

SUNSTAR 微波光电 TEL: FAX: v HMC750LP4 / 750LP4E 12.5 Gbps LIMITING AMPLIFIER

ASNT8140. ASNT8140-KMC DC-23Gbps PRBS Generator with the (x 7 + x + 1) Polynomial. vee. vcc qp. vcc. vcc qn. qxorp. qxorn. vee. vcc rstn_p.

LMH0344 3Gbps HD/SD SDI Adaptive Cable Equalizer

How to overcome/avoid High Frequency Effects on Debug Interfaces Trace Port Design Guidelines

Features. For price, delivery, and to place orders, please contact Hittite Microwave Corporation:

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

ASNT8142-KMC Generator of DC-to-23Gbps PRBS with Selectable Polynomials

Designing High Performance Interposers with 3-port and 6-port S-parameters

Design of Fault Coverage Test Pattern Generator Using LFSR

Brian Holden Kandou Bus, S.A. IEEE GE Study Group September 2, 2013 York, United Kingdom

Analog Performance-based Self-Test Approaches for Mixed-Signal Circuits

Switching Solutions for Multi-Channel High Speed Serial Port Testing

The EMC, Signal And Power Integrity Institute Presents

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

TKK S ASIC-PIIRIEN SUUNNITTELU

Forward-Looking Statements

Powerful Software Tools and Methods to Accelerate Test Program Development A Test Systems Strategies, Inc. (TSSI) White Paper.

High Performance TFT LCD Driver ICs for Large-Size Displays

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Draft Baseline Proposal for CDAUI-8 Chipto-Module (C2M) Electrical Interface (NRZ)

Parameter Input Output Min Typ Max Diode Option (GHz) (GHz) Input drive level (dbm)

EVALUATION KIT AVAILABLE Multirate SMPTE SD/HD Cable Driver with Selectable Slew Rate TOP VIEW +3.3V. 10nF IN+ IN- MAX3812 SD/HD GND RSET +3.

Scan. This is a sample of the first 15 pages of the Scan chapter.

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Design for Testability

GaAs MMIC Triple Balanced Mixer

HMC-C060 HIGH SPEED LOGIC. 43 Gbps, D-TYPE FLIP-FLOP MODULE. Features. Typical Applications. General Description. Functional Diagram

Practical De-embedding for Gigabit fixture. Ben Chia Senior Signal Integrity Consultant 5/17/2011

MMI: A General Narrow Interface for Memory Devices

Layout Analysis Analog Block

Challenges in the design of a RGB LED display for indoor applications

GaAs DOUBLE-BALANCED MIXER

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

GaAs DOUBLE-BALANCED MIXER

MARKET OUTPERFORMERS CELERITAS INVESTMENTS

Layers of Innovation: How Signal Chain Innovations are Creating Analog Opportunities in a Digital World

GaAs DOUBLE-BALANCED MIXER

System-Level Timing Closure Using IBIS Models

7 DESIGN ASPECTS OF IoT PCB DESIGNS JOHN MCMILLAN, MENTOR GRAPHICS

GaAs DOUBLE-BALANCED MIXER

HMC-C064 HIGH SPEED LOGIC. 50 Gbps, XOR / XNOR Module. Features. Typical Applications. General Description. Functional Diagram

FDTD_SPICE Analysis of EMI and SSO of LSI ICs Using a Full Chip Macro Model

GaAs MMIC Double Balanced Mixer

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

Static Timing Analysis for Nanometer Designs

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Digital Correction for Multibit D/A Converters

Forward-Looking Statements

GaAs MMIC Double Balanced Mixer

HMC958LC5 HIGH SPEED LOGIC - SMT. Typical Applications. Features. Functional Diagram. General Description

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Lecture 18 Design For Test (DFT)

Half-Rate Decision-Feedback Equalization Di-Bit Response Analysis and Evaluation EDA365

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

ASNT_PRBS20B_1 18Gbps PRBS7/15 Generator Featuring Jitter Insertion, Selectable Sync, and Output Amplitude Control

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

SoC IC Basics. COE838: Systems on Chip Design

Good afternoon! My name is Swetha Mettala Gilla you can call me Swetha.

Frame Processing Time Deviations in Video Processors

Technology Scaling Issues of an I DDQ Built-In Current Sensor

Verification of HBM through Direct Probing on MicroBumps

The use of an available Color Sensor for Burn-In of LED Products

MAX11503 BUFFER. Σ +6dB BUFFER GND *REMOVE AND SHORT FOR DC-COUPLED OPERATION

New Serial Link Simulation Process, 6 Gbps SAS Case Study

RX40_V1_0 Measurement Report F.Faccio

Design Project: Designing a Viterbi Decoder (PART I)

Made- for- Analog Design Automation The Time Has Come

New Techniques for Designing and Analyzing Multi-GigaHertz Serial Links

GaAs MMIC Double Balanced Mixer

Post Silicon Electrical Validation Lecture 2. Tony Muilenburg

Introduction to Data Conversion and Processing

Technology Cycles in AV. An Industry Insight Paper

Designing for High Speed-Performance in CPLDs and FPGAs

DesignCon Tips and Advanced Techniques for Characterizing a 28 Gb/s Transceiver

Analog Devices Welcomes Hittite Microwave Corporation NO CONTENT ON THE ATTACHED DOCUMENT HAS CHANGED

Features OBSOLETE. = +25 C, As an IRM. IF = MHz. Frequency Range, RF GHz. Frequency Range, LO

GS1524 HD-LINX II Multi-Rate SDI Adaptive Cable Equalizer

CHAPTER 2 SUBCHANNEL POWER CONTROL THROUGH WEIGHTING COEFFICIENT METHOD

11. Sequential Elements

24. Scaling, Economics, SOI Technology

De-embedding Techniques For Passive Components Implemented on a 0.25 µm Digital CMOS Process

CPD LED Course Notes. LED Technology, Lifetime, Efficiency and Comparison

TCP-3039H. Advance Information 3.9 pf Passive Tunable Integrated Circuits (PTIC) PTIC. RF in. RF out

THE FUTURE OF NARROWCAST INSERTION. White Paper

Transcription:

WHITE PAPER Implementation Challenges and Solutions of Low-Power, High-Performance Memory Systems 1050 Enterprise Way, Suite 700 Sunnyvale, CA 94089 Phone: + 1 408 462 8000 Fax: + 1 408 462 8001 www.rambus.com 2014 Rambus Inc. 1

Implementation Challenges and Solutions of Low-Power, High-Performance Memory Systems Mobile devices and their demand for rapid innovation have fundamentally and forever changed the semiconductor industry. These devices have fueled tremendous innovation in the last few years to bring about drastic improvements in performance, power and cost efficiency. They also demand condensed product development cycles which accelerate the rate and need for innovation. The only thing that has remained the same is our industry s ability to drive innovation, meet new technical challenges and develop new processes and technologies. This is the first of a two-part whitepaper that examines the key factors that are driving new requirements in mobile memory system design and analysis. Trends include: Memory speed rising rapidly to keep pace with processor performance Energy consumed per bit of transmission decreasing proportionally Limited footprint forcing adoption of package-on-package and other package-level integration technology Commoditization pushing growth from the high to the lower end of the market When translated into electrical requirements, the trends in mobile memory pose tremendous challenges in signal and power integrity for SoC and system developers, including: Decreasing timing and voltage margin Escalating circuit jitter sensitivity to supply noise Rising supply noise Increasing signal and supply noise coupling Growing signal integrity degradations due to manufacturing variations 2014 Rambus Inc. 2

Memory Solutions Trends Mobile processor performance has been rapidly rising, which is evident when considering the advance of mobile processors from single core at <500MHz (the original iphone) to quad cores at close to 2GHz (Galaxy S4, US edition) in just six years. This makes it challenging to develop memory solutions that can keep pace and fully utilize the power of the processor. Figure 1 shows the historical bandwidth trends of three mainstream memory solutions DDR, LPDDR and GDDR and one emerging solution, HMC (Hybrid Memory Cube). While it took 13 years (2000-2013) for DDR bandwidth to increase an order of magnitude from 200 to 2133Mbps, LPDDR bandwidth, driven by demand of mobile processors, experienced the same increase in only nine years (2005-2014). Figure 1. Historical Memory Bandwidth Trends. Source: Internal Analysis Designers look for solutions that enable memory bandwidth to rapidly trend up. At the same time, they contend with power requirements that remain relatively unchanged, including; Thermal envelope tolerance of the human body does not change Power density of the IC Battery size Figure 1 reveals that in the same period LPDDR bandwidth increased by an order of magnitude from 2005 to 2014, the general trend in energy per bit decreased roughly by an order of magnitude 1, resulting in relatively constant power. In addition to delivering higher bandwidth at the same low power, mobile device developers must contend with low profiles and limited real estate available for adding parts. Taken together, it forces the adoption of more integration to expand functionality. This trend is directly related to the adoption of package-onpackage (PoP) technology, especially by high-end smartphones. In addition to presenting some unique electrical challenges, PoP moves the memory interface out-of-sight, which significantly reduces the ability to observe one of the highest speed data transmission channels in the system. In contrast, commoditization in the mobile market is pushing growth down from the higher end to the lowerend market. Figure 2 shows the projected smartphone unit shipment growth by price range up to 2018 [1]. The CAGR for unit price ranges of $0-199, $200-399 and $400+ are 36%, 9% and 6%, respectively. Therefore, while the growth of high-end and, to a lesser extent, mid-range smartphones is tapering, low-end smartphones will experience tremendous growth in the next 5 years. 1 The energy per bit curve in Figure 1 indicates a general trend and is not specific to LPDDR evolution. 2014 Rambus Inc. 3

Million Units 1,000 800 600 400 200 0 2010 2012 2014 2016 2018 2020 $0-199 $200-399 $400- The growth of the lower-end smartphones, together with the highly-competitive nature of the market, is putting downward pressure on costs. As a result, the same high-bandwidth memory solutions used in high-end smartphones today must be delivered by the mid-range and eventually low-end devices of tomorrow, albeit at lower implementation costs. Figure 2. Smartphone Unit Shipment Growth by Price Range. Source: ABI Research Signal and Power Integrity Challenges Trends in mobile memory pose tremendous challenges in signal and power integrity to SoC and system developers. Higher data rates lead to smaller bit time and faster signal transitions. The former results in a smaller timing budget while the latter creates larger signal degradations due to loss and reflection. Together they significantly reduce system timing margin. In addition, lower energy per bit requires lower supply voltages and smaller signal swings, both leading to smaller system voltage margin. To minimize power consumption, low-power, high-performance memory systems utilize aggressive power management as well as circuits optimized for power. Aggressive power management creates frequent power state transitions increasing both the magnitude and probability of worst case supply noise. On the other hand, circuit optimization for power is often at the expense of increased timing jitter sensitivity to supply noise. Together, they further exacerbate the problem of shrinking timing and voltage margin. Table 1 shows the trend of shrinking timing and voltage margins as LPDDR data rate increases. Data signal (DQ) margin of read operation is used for illustration purposes. Timing Margin for Rest of Channel and DRAM Output Swing provides a measure of timing and voltage margins respectively available to the controller receiver, package and PCB after accounting for the DRAM specifications. Data Rate (Mbps) Bit Time (ps) DRAM max tdqsq (ps) DRAM min tqh (ps) Timing Margin for Rest of Channel 1 (ps) DRAM Output Swing 2 (mv) LPDDR2 200 5000 700 2800 2100 96 LPDDR2 800 1250 240 670 430 960 LPDDR3 1600 625 135 475 340 800 LPDDR3 2133 468.8 100 356.4 256.4 690 LPDDR4 3200 312.5 TBD TBD TBD 350-400 Table1: DQ Timing and Voltage Margin vs LPDDR Data Rate 1 Timing Margin for Rest of Channel = (Bit Time) [tdqsq + (Bit Time tqh)] 2 DRAM Output Swing assumes ODT = 240 at 1600Gbps, 120 at 2133Gbps 2014 Rambus Inc. 4

Diminishing device footprint and reliance on low-cost packaging solutions to meet optimal price points for mid-range mobile devices raises several challenges that mobile device developers must contend with. More functionality is now packed into smaller form factors and demands increasingly tight integration. This results in declining pin pitch and spacing, translating into more crosstalk and supply noise coupling. Such degradations are further magnified by the use of low-cost packaging solutions driven by system cost constraints. This is especially true in the use of wire-bond packaging for the DRAM, which has much higher self and mutual inductance as compared to flip-chip packaging. Another challenge posed by packaging is found in high-end mobile devices where small form factors and tight integration leads to the adoption of PoP packaging. Figure 3 shows the typical construction of a PoP application processor. While the upper package houses a mix of DRAM and flash dice connected to the substrate with bond wires, the lower package carries the application processor flip-chip connected to the substrate. PoP packaging poses some unique electrical challenges. First, power distribution to the upper memory package must flow through the lower application processor package, where both packages have inferior electrical properties due to narrow inductive traces and insufficient signal reference planes. This results in both large supply noise and large supply noise coupling. Secondly, to create a cavity to accommodate the lower application processor chip, pin placement on the upper package is constrained to the periphery, limiting the number of pins available. As the application processor die grows in size to expand functionality, or the number of memory signal pins increases to support higher memory bandwidth, the package size must grow or the pin pitch must be reduced, resulting in increasing cost as well as coupling. DRAM or Flash Die Optional spacer DRAM or Flash Die Upper Package Substrate Application Processor Lower Package Substrate Figure 3. PoP Construction and Footprint 2014 Rambus Inc. 5

Figure 4 depicts a LPDDR3 footprint comparison between three implementations: 1) JEDEC specification 2) in Samsung Galaxy S4 3) in Apple iphone 5S 1) JEDEC Standard Footprint 12x12mm 0.4mm pitch 216 BGA 2) Samsung Exynos 5 Octa found in Galaxy S414x14mm 0.35mm pitch 296 BGA 3) Apple A7 found in iphone 5S 14x15.5mm 0.35mm pitch 456 BGA To overcome the challenges of using PoP for application processors, designers of both Samsung and Apple devices chose to use a bigger package, a smaller pin pitch, and a higher pin count than JEDEC specification to implement 2 x32 channels of LPDDR3 rated for 1600Mbps operation,. Mobile devices geared toward the lower end of the market have another set of challenges that mobile designers must address. These challenges are mainly due to commoditization trends and downward cost pressures leading to the adoption of low-cost manufacturing with less process control and larger manufacturing variations. This, in turn, results in increasing variations in system margin. Figure 5 depicts the variation in passive insertion loss of a memory channel due to manufacturing variations of the controller and DRAM packages, as well as PCB [2]. For 3.2Gbps operation with a Nyquist frequency at 1.6GHz, the variation in insertion loss is up to 20%, which, based on simulation of this particular channel, translates into timing jitter variations of more than 0.2UI or bit time. Figure 5. Insertion Loss Variations due to Manufacturing Variations The gradual introduction of sophisticated techniques to standards-based memory solutions to address these challenges is a testimony to their increasing significance. For example, optional termination and some data and command bus training have been introduced in LPDDR3 to improve signal integrity and reduce static timing offsets, while additional training and internal Vref calibration are expected to be introduced in LPDDR4 to reduce additional offsets to improve timing and voltage margin. 2014 Rambus Inc. 6

Design Analysis for Power and Performance Challenges and Solutions Signal and power integrity challenges are prominent in the implementation of low-power, high-performance memory systems. Design analysis must evolve to a more holistic approach in order to better manage shrinking timing and voltage margins. The conventional waterfall design approach where different parts of the system are designed in a sequential manner is no longer acceptable. In optimizing the cost/performance of an Design analysis must evolve to a more holistic approach in order to better manage shrinking timing and voltage margins. upstream component such as a chip, too much margin may be allocated to it to reduce cost such that little is left for the components downstream, which can drastically increase their costs. To circumvent this, the entire system consisting of chips, packages and PCBs should be designed in parallel so that the different components can be optimized concurrently to guarantee robust systems at reasonable costs. The implication is that traditionally disparate EDA tools which have been separately tailored towards chip, package and PCB design must converge to provide a concurrent design platform with feedback paths to account for mutual interactions. For example, PCB layout optimization may require re-optimization of the package pin assignments, which, in turn, will require re-optimization of the package layout. Moreover, an efficient simulation methodology is required to enable fast yet accurate analysis of the entire system both to drive design optimization and for verification. The challenge is not only in being able to handle a large and complex model encompassing all parts of the system, but also in accurately modeling design features with dimensions spanning orders of magnitude from deep sub-micron for chips to millimeter for PCBs. One solution is to use a divide-and-conquer approach where the same system model is optimized in different ways based on the analysis performed. For example, thorough power integrity analysis includes simulation of static IR drop as well as medium and high frequency AC noise. Since the chip design dominates both static IR drop and high frequency AC noise, the package and PCB models can be simplified for these simulations. On the other hand, since medium frequency AC noise is dominated by the package and PCB, the chip model can be simplified in its simulation. As a corollary to the co-design requirement, signal and power integrity optimization must occur at every stage of the design cycle. Otherwise, later stage optimization can become costly and it may be necessary to iterate the design, significantly increasing time-to-market. To address this challenge, intelligence must be built into design tools to enable SI/PI analysis even before the design takes shape. For example, the assignments of flip-chip bumps can have a large impact on current distribution among To avoid this pitfall, a statistical approach needs to be introduced into the analysis methodology to minimize pessimism while accounting for worst-case variations. bumps and thus both power supply IR drop and electro-migration. However, most off-the-shelf PI analysis tools require a layout before analysis can be performed. Therefore, after initial bump assignments are made, the designer must create a preliminary layout before IR drop can be estimated. If IR drop is then determined to be excessive, correction may require bump assignment re-optimization and hence iteration of layout, leading to increased design time. A second implication of the shrinking timing and voltage margins is that the conventional budgeting and analysis approach where the worst case values of different components are combined with no probabilistic consideration to determine final system margins is also inadequate. Such an approach can either result in a large negative budget, or unrealistically tight component specifications, resulting in over-designed and expensive systems. To avoid this pitfall, a statistical approach needs to be introduced into the analysis methodology to minimize pessimism while accounting for worst-case variations. In other words, a statistical 2014 Rambus Inc. 7

model should be created for each parameter, which is then analyzed in conjunction with the models for the other parameters to determine the statistical distribution of timing and voltage margins of the system. The robustness of the system can then be quantified by computing the probability of the system having non-zero margins. Figure 6 shows a comparison in simulated timing jitter of a 3.2Gbps memory system [3]. In worst-case simulation, each parameter in the system model can take on its low, typical and high value with equal probability. A full factorial simulation is run using every possible combination of parametric values and the resultant timing jitter distribution extracted. The statistical simulation, on the other hand, uses a linear regression model based on the Taguchi Design of Experiment method and uses more realistic probability distributions for the model parameters in the simulation. The worst-case approach predicts a 6- timing jitter of >173ps, which is more than 15% larger than that from the statistical simulation. Figure 6. a) Statistical vs. b) Worst-Case Timing Jitter Simulation Conclusion As mobile devices continue their rapid trajectory towards increased performance and power efficiencies that are packed into shrinking form factors and manufactured at low costs, memory system analysis methodologies must evolve to address signal and power integrity challenges in terms of: Decreasing timing and voltage margins Escalating circuit jitter sensitivity to supply noise Rising supply noise Increasing signal and supply noise coupling Growing signal integrity degradations due to manufacturing variations To combat shrinking timing and voltage margins, more holistic and statistical approaches to design and analysis must be adopted to optimize design for system costs, minimize design iterations and shorten time-tomarket. As a result, signal and power integrity analysis must start earlier in the design cycle and chips, and packages and PCBs should be designed in parallel and analyzed statistically. Part 2 of this whitepaper will examine the challenges and solutions of validating low-power, high-performance memory systems, including: Growing manufacturing variations due to cost constraints Increasing significance of long term impact of circuit random timing jitter Difficulty in extrapolating measurement results from probing to account for manufacturing variations and long term jitter Inability of pass or fail functional testing to quantify system robustness 2014 Rambus Inc. 8

References [1] ABI Research, ABI(MD-MDMT-159)_Mobile Device Shipments_08Jan14.xlsm [2] Yip, Wai-Yeung, Scott Best, Wendemagegnehu Beyene, Ralf Schmitt, System Co-design and Co-Analysis Approach to Implementing the XDR Memory System of the Cell Broadband Engine Processor, ASP-DAC 2007 [3] Beyene, Wendemagegnehu, Newton Cheng, June Feng, Chuck Yuan, Statistical and Sensitivity Analysis of Voltage and Timing Budgets of Multi-Gigabit Interconnect Systems, DesignCon 2004 1050 Enterprise Way, Suite 700 Sunnyvale, CA 94089 Phone: + 1 408 462 8000 Fax: + 1 408 462 8001 www.rambus.com 2014 Rambus Inc. 9