Architectural Considera1on for 100 Gb/s/lane Systems

Similar documents
Architectural Consideration for 100 Gb/s/lane Systems

System Evolution with 100G Serial IO

CAUI-4 Chip to Chip and Chip to Module Applications

CAUI-4 Application Requirements

Ali Ghiasi. Nov 8, 2011 IEEE GNGOPTX Study Group Atlanta

CAUI-4 Chip to Chip Simulations

Ali Ghiasi. Jan 23, 2011 IEEE GNGOPTX Study Group Newport Beach

Application Space of CAUI-4/ OIF-VSR and cppi-4

Summary of NRZ CDAUI proposals

100 Gb/s per Lane for Electrical Interfaces and PHYs CFI Consensus Building. CFI Target: IEEE November 2017 Plenary

100GEL C2M Channel Reach Update

50 Gb/s per lane MMF objectives. IEEE 50G & NGOATH Study Group January 2016, Atlanta, GA Jonathan King, Finisar

Further Investigation of Bit Multiplexing in 400GbE PMA

50 Gb/s per lane MMF baseline proposals. P802.3cd, Whistler, BC 21 st May 2016 Jonathan King, Finisar Jonathan Ingham, FIT

The Case of the Closing Eyes: Is PAM the Answer? Is NRZ dead?

PAM8 Gearbox issues Andre Szczepanek. PAM8 gearbox issues 1

CDAUI-8 Chip-to-Module (C2M) System Analysis #3. Ben Smith and Stephane Dallaire, Inphi Corporation IEEE 802.3bs, Bonita Springs, September 2015

MR Interface Analysis including Chord Signaling Options

Brian Holden Kandou Bus, S.A. IEEE GE Study Group September 2, 2013 York, United Kingdom

DataCom: Practical PAM4 Test Methods for Electrical CDAUI8/VSR-PAM4, Optical 400G-BASE LR8/FR8/DR4

Further Clarification of FEC Performance over PAM4 links with Bit-multiplexing

Draft Baseline Proposal for CDAUI-8 Chipto-Module (C2M) Electrical Interface (NRZ)

Approach For Supporting Legacy Channels Per IEEE 802.3bj Objective

XLAUI/CAUI Electrical Specifications

Thoughts on 25G cable/host configurations. Mike Dudek QLogic. 11/18/14 Presented to 25GE architecture ad hoc 11/19/14.

PAM8 Baseline Proposal

802.3bj FEC Overview and Status. PCS, FEC and PMA Sublayer Baseline Proposal DRAFT. IEEE P802.3ck

Measurements and Simulation Results in Support of IEEE 802.3bj Objective

Analysis on Feasibility to Support a 40km Objective in 50/200/400GbE. Xinyuan Wang, Yu Xu Huawei Technologies

Presentation to IEEE P802.3ap Backplane Ethernet Task Force July 2004 Working Session

PAM-2 on a 1 Meter Backplane Channel

52Gb/s Chip to Module Channels using zqsfp+ Mike Dudek QLogic Barrett Bartell Qlogic Tom Palkert Molex Scott Sommers Molex 10/23/2014

100Gb/s Single-lane SERDES Discussion. Phil Sun, Credo Semiconductor IEEE New Ethernet Applications Ad Hoc May 24, 2017

100G PSM4 & RS(528, 514, 7, 10) FEC. John Petrilla: Avago Technologies September 2012

Practical Receiver Equalization Tradeoffs Applicable to Next- Generation 28 Gb/s Links with db Loss Channels

50GbE and NG 100GbE Logic Baseline Proposal

Comparison of options for 40 Gb/s PMD for 10 km duplex SMF and recommendations

CDAUI-8 Chip-to-Module (C2M) System Analysis. Stephane Dallaire and Ben Smith, September 2, 2015

100G CWDM Link Model for DM DFB Lasers. John Petrilla: Avago Technologies May 2013

PRE-QSFP-LR4L 100G QSFP 28 Dual Range Optical Transceiver, 10km. Product Features: General Product Description:

100G EDR and QSFP+ Cable Test Solutions

Need for FEC-protected chip-to-module CAUI-4 specification. Piers Dawe Mellanox Technologies

Technical Feasibility of Single Wavelength 400GbE 2km &10km application

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta

3M High-Speed Solutions

Investigation on Technical Feasibility of Stronger RS FEC for 400GbE

100GBASE-FR2, -LR2 Baseline Proposal

CFPQD010C10D CFP Dual Fibre 1310nm* / 10km / 100GBASE-LR4 & OTN OTU4

802.3bj FEC Overview and Status. 400GbE PCS Baseline Proposal DRAFT. IEEE P802.3bs 400 Gb/s Ethernet Task Force

500 m SMF Objective Baseline Proposal

10Gbps SFP+ Optical Transceiver, 10km Reach

802.3bj FEC Overview and Status IEEE P802.3bm

Investigation of PAM-4/6/8 Signaling and FEC for 100 Gb/s Serial Transmission

Investigation of PAM-4/6/8 Signaling and FEC for 100 Gb/s Serial Transmission

IMPACT ORTHOGONAL ROUTING GUIDE

100GBASE-DR2: A Baseline Proposal for the 100G 500m Two Lane Objective. Brian Welch (Luxtera)

Next Generation Ultra-High speed standards measurements of Optical and Electrical signals

M809256PA OIF-CEI CEI-56G Pre-Compliance Receiver Test Application

64G Fibre Channel strawman update. 6 th Dec 2016, rv1 Jonathan King, Finisar

In support of 3.5 db Extinction Ratio for 200GBASE-DR4 and 400GBASE-DR4

Performance comparison study for Rx vs Tx based equalization for C2M links

Transmitter Specifications and COM for 50GBASE-CR Mike Dudek Cavium Tao Hu Cavium cd Ad-hoc 1/10/18.

100G-FR and 100G-LR Technical Specifications

Truck router (3Gbps/HD/SD/ASI)

10Gb/s SFP+ ER 1550nm Cooled EML with TEC, PIN Receiver 40km transmission distance

100G and 400G Datacom Transmitter Measurements

An Approach To 25GbE SMF 10km Specification IEEE Plenary (Macau) Kohichi Tamura

Open electrical issues. Piers Dawe Mellanox

Course Title: High-Speed Wire line/optical Transceiver Design

Validation of VSR Module to Host link

PAM4 signals for 400 Gbps: acquisition for measurement and signal processing

Cisco Prisma II 1310 nm, High-Density Transmitter and Host Module for 1.2 GHz Operation

National Park Service Photo. Utah 400 Series 1. Digital Routing Switcher.

Issues for fair comparison of PAM4 and DMT

40G SWDM4 MSA Technical Specifications Optical Specifications

Senior Project Manager / AEO

40G SWDM4 MSA Technical Specifications Optical Specifications

Optical transmission feasibility for 400GbE extended reach PMD. Yoshiaki Sone NTT IEEE802.3 Industry Connections NG-ECDC Ad hoc, Whistler, May 2016

Further information on PAM4 error performance and power budget considerations

An Overview of Beam Diagnostic and Control Systems for AREAL Linac

Improved extinction ratio specifications. Piers Dawe Mellanox

10 Gb/s Duobinary Signaling over Electrical Backplanes Experimental Results and Discussion

MOST - Roadmap Physical Layer & Connectivity from 150Mbps to 5Gbps

Thoughts about adaptive transmitter FFE for 802.3ck Chip-to-Module. Adee Ran, Intel Phil Sun, Credo Adam Healey, Broadcom

100G SR4 Link Model Update & TDP. John Petrilla: Avago Technologies January 2013

100G MMF 20m & 100m Link Model Comparison. John Petrilla: Avago Technologies March 2013

FEC IN 32GFC AND 128GFC. Scott Kipp, Anil Mehta June v0

400G-FR4 Technical Specification

TWINAX FLYOVER CABLE HIGH-SPEED & BACKPLANE OPTICS

10G SFP+ Modules. 10G SFP+ Module Series

Headend Optics Platform (CH3000)

40GBd QSFP+ SR4 Transceiver

The receiver section uses an integrated InGaAs detector preamplifier (IDP) mounted in an optical header and a limiting postamplifier

Baseline Proposal for 200 Gb/s Ethernet 40 km SMF 200GBASE-ER4 in 802.3cn

Measurements Results of GBd VCSEL Over OM3 with and without Equalization

10Gb/s SFP+ Optical Transceiver Module 10GBASE-LR/LW

Keysight N1085A PAM-4 Measurement Application For 86100D DCA-X Series Oscilloscopes. Data Sheet

Combating Closed Eyes Design & Measurement of Pre-Emphasis and Equalization for Lossy Channels

Combating Closed Eyes Design & Measurement of Pre-Emphasis and Equalization for Lossy Channels

Refining TDECQ. Piers Dawe Mellanox

Transcription:

Architectural Considera1on for 100 Gb/s/lane Systems Ali Ghiasi Feng Hong Xinyuan Wang Yu Xu Ghiasi Quantum Huawei Huawei Huawei IEEE Meeting Rosemount February 7, 2018

Overview High capacity systems based on 112G/lane electrical will test conventional cooling limits and will come at cost premium 112G/lane electrical is necessary to enable next generation routers and high capacity data center switches The cost benefit of 112G system may only be realized in large scale applications reuiring highest capacity What is most important for initial 112G systems deployment are C2M supporting at least 200 mm PCB trace C2C supporting at least 400 mm + 1 connector Re-use of RS (544, 514) Study group should also consider defining 0.5 m conventional or 1 m cabled backplane with 25 db ballball or 35 db bump to bump loss (assuming 5 db package loss) Both RS (544, 514) as well as stronger FEC should be studied Study group also may consider Cu cabling solution with following caveats Cu cabling should not compromise C2M PCB trace length High radix 256 switches significantly reduces 1 st switch to server use case given Cu cable reach is <3 m Extra retimers and higher power LR SerDes on the host raises system max operating power Active-Cu/AOC doesn t raise the max system operating power as the retimer in the active-cu/aoc replaces a higher power SMF module Given the level of support for 2 m Cu DAC one option to explore is asymmetrical link optimized for switch to server without compromising TOR switch PCB reach. 2

100G/lane System Concerns: Power and Cost Challenges Cost/Gb and power/gb increasing with migra5on from 25G to 50G and 100G CMOS technology scaling has slowed down 100G/lane system power may exceed limits of conven@onal air-cooled 100G/lane is reuired for next Gen routers and leading edge Hyper-scale but may not be the answer for every data center! SerDes Power (mw per lane) System Bandwidth 600 Capacity (Gbps) Power Wall Power/Cooling limita@on 200 Analog-based SerDes FEC is not needed ADC-based SerDes FEC is mandatory 12G@28nm 112G@7nm System Power(Watt) 4.5X PCB Cost Yesterday Now Future X 3 FR4 Higher speed introduces higher channel loss It needs more expensive PCB material M4 M6 M7

112G Electrical Backplane: Innova5ons are needed for Both Passive Channel and SerDes Is Electrical Still A Viable Solution? 112G backplane In Co a v no n! Electrical switch Channel Improvement! Will Optical Replace Electrical? 112Gbps OpCcal Backplane Cabled 56Gbps Electrical switch 4 Optical IO / Electrical switch Optical IO / Optical switch? 56G backplane 4 Optical Interconnect C! ST O :C e g en l l ha

C2M Applications Numerous study in IEEE and OIF have shown typical line card reuire about 200-250 mm host traces CAUI-4 loss budget is 10.2 db supporting ~125 mm on mid-grade PCB material like Isola 408HR Most line card implementation prefer not to use retimer to save power and instead use Megtron 6 like material to extend CAUI-4 PCB reach to ~250 mm 8 A C2M channel supporting ~125 mm by assuming best PCB material like Megtron 7 or Tacyhon 100 would not meet C2M applications C2M applications need to support at least 200 m on material such as Megtron 7/Tachyon. 200 mm Switch/ ASIC 250 mm ~125 mm R R R R R R R R 5 5

C2M Needs Prac-cal PCB Trace Length and Construc-on TE OSFP channel data is an example of a well built C2M channel h8p://www.ieee802.org/3/100gel/public/18_01/tracy_100gel_01a_0118.pdf But the laser micro-via not feasible for complex board with several rou-ng layers 2X cal trace showed 1.36 db/in loss @28 GHz (~1.3 db/in @26.55 GHz) 8.5 host channel on Megtron 7 HVLP+OSFP Connector+1.5 plug PCB has loss of ~15 db@26.5 GHz 1.5 8.5 1.5 8.5 6

PCB loss es)mate assump)ons and tools for calcula)on C2M Channel Reach Rogers Corp impedance calculator (free download but reuire registra;on) h=ps://www.rogerscorp.com/acm/technology/index.aspx The IEEE tool if updated could be another op;on to es;mate channel reach h=p://www.ieee802.org/3/bj/public/tools/reference DkDf_AlegbraicModel_v2.04.pdf Stripline ~ 50 W, trace width is 5.5 mils, and with ½ oz Cu Isola 408HR DK=3.65, DF=0.0095, RO=2.5 um, Meg-6 DK=3.4, DF=0.005, RO 1.2 µm, Tachyon100 DK=3.02, DF=0.0021, RO=1.2 µm To support euivalent PCB traces for C2M need at least 15 db end-end channel loss consistent with tracy_100gel_01a_0118 Host Trace Length (in) Total Loss (db) Host Loss(dB) Isola 408HR Megtron 6 Tachyhon100 Nominal PCB Loss/in at 5.15 GHz N/A N/A 0.65 0.52 0.46 Nominal PCB Loss/in at 13 GHz N/A N/A 1.27 0.98 0.83 Nominal PCB Loss/in at 27 GHz N/A N/A 2.18 1.60 1.28 28G-VSR with one connector & HCB* 10.5 6.81 5.4 6.9 8.2 lim_100gel_adhoc_01_022618 Proposed 11.7 7.2 3.3 4.5 5.6 Reach Inches Too Short Current 112G-VSR draft+one connector & HCB** 13.5 9 4.1 5.6 7.0 100G C2M by Scaling 28G + connector & HCB** 15 10.5 4.8 6.6 8.2 * Assumes connector loss is 1.69 db and HCB loss is 2.0 db at 12.89 GHz ** Assumes connector loss is 2.0 db and HCB loss also 2.5 db at 27 GHz. 7

Evolu&on of Front Panel Ports Pluggable at 25 Gb/s and 50 Gb/s Pluggable at 100 Gb/s ~200 mm ~200 mm 15 db 8 PHY less design what we are used to Supports passive Cu DAC Switch directly drives optical modules Switch directly drives 3 m of Cu DAC Offers optimum power and cost. Option I PHYless Design Channel loss 15 db Supports AOC, Active DAC, and Optics Doesn t support passive Cu DAC 15 db loss supports at least 200 mm PCB traces on premium material such as Megtron 7/Tachyon PCB Offers improve power and cost Better choice for MOR/Spine switches Option II Reuire PHY Channel loss 10 db Given that high radix switches if used as TOR reuire connecting servers on 4-6 racks passive DAC no longer feasible Low capacity switches that can serve single server rack can just stay with 50G signaling Adding 100G retimer assuming 1W/lane on a system having 16 line card with each line card based on 256by100G will add whopping 4 KW to the system power envelop!

Datacenter Trends Switch radix over the last 9 years has increased from 64x10G, 128x25G, now to 256x50G, and likely to 256x100G by 2019/2020 To mitigate full rack failure dual MOR switches may connect to each rack. 16 uplinks to EOR switch 640G TOR 64x10G 32 uplinks to EOR switch 3.2T MOR 128x25G 64 uplinks to EOR switch 12.8T MOR 256x50G 64 uplinks to EOR switch 25.6T MOR 256x100G Assume 3:1 Over-subscription 48 downlink to 48 1 RU 10G servers 96 downlink to 96 25G servers ConnecOng 2 racks 192 downlink to 50G servers ConnecOng 4-8 racks 192 downlink to 50G servers ConnecOng 4-8 racks 9 200G/400G MMF Study Group

Study Performed By Joel Goergen in 802.3by Indicate 3 m is necessary for Cu Cable! Given that high radix switches can connect to 4-6 racks of server passive Cu cable no longer a viable option for 1st level switchservers Potential use case for Cu cables at 112G will be switch to switch and one may not assume asymmetrical link Application not driven by network performance may use an small TOR switch within the rack for simplicity and use 25G/50G Cu cabling! hmp://www.ieee802.org/3/by/public/july15/goergen_3by_02a_0715.pdf 10

The 50G/lane Interconnect Ecosystems OIF has defined both NRZ and PAM4 for MR, VSR, XSR, and USR IEEE P802.3bs and P802.3cd are defining PAM4 signaling for 50G/lane Chip-to-chip, chip-tomodule, Cu DAC, and backplane An LR SerDes operating at 29 GBd may have 37 db of loss from bump to bump! Application Standard Modulation Reach Loss Loss Ball-ball Bump-bump Chip-to-OE (MCM) OIF-56G-USR NRZ < 1cm 2 db@28 GHz NA Defined in OIF Chip-to-nearby OE (no connector) OIF-56G-XSR NRZ/ PAM4 <7.5 cm 1 8 db@28 GHz 4.2 db@14 GHz 12.2 db@14 GHz 4.2 db@14 GHz Chip-to-module (one connector) OIF-56G-VSR IEEE CDAUI-8 NRZ/PAM4 PAM4 < 10 cm 2 <20 cm 18 db@28 GHz 10 db@13.3 GHz 26 db@28 GHz 14 db@13.3 GHz Chip-to-chip (one connector) OIF-56G-MR IEEE CDAUI-8 NRZ/PAM4 PAM4 < 50 cm < 50 cm 35.8 db@28 GHz 20 db@13.3 GHz 47.8 db@28 GHz 3 26 db@13.3 GHz Backplane (two connectors) OIF-56-LR IEEE 200G-KR4 PAM4 PAM4 <100 cm <100 cm 30dB@14.5 GHz 30dB@13.3 GHz ~37dB@14.5 GHz 4 36dB@13.3 GHz 1. OIF XSR definiron likely too short for any pracrcal OBO implementaron! 2. OIF VSR 10 cm reach assumes 10 cm mid-grade PCB but typical implementaron uses Meg6/ Tachyon 100 with ~25 cm! 3. Include 2x6 db for package loss but 47.8 db seem beyond eualizaron capability 4. Include 2x3.5 db for package loss. 11 Defined in IEEE and OIF

The 100G/lane Eco-System will be follow 50G Eco-system With estimated loss of 18 db VSR specification is inline with our definition of MR Bump to bump loss calculated by assuming ASIC package with 5 db loss PCB reaches below assumes Tachyon 100/Megtron 7 Bump-bump loss for LR SerDes reduced by 1 db from 50G PAM to account for additional impairment related to crosstalk, reflection, and ILD. Application Standard Modulatio n Reach Ball-Ball Loss Bump-Bump Loss Chip-to-OE TBD PAM4 < 1 cm NA 2 db (MCM) Chip-to-nearby OE TBD PAM4 <10 cm* 5 db 12 db (no connector) Chip-to-module C2M PAM4 < 20 cm** 15 db 20 db (one connector) Chip-to-chip C2C PAM4 < 40 cm 20 db 30 db (one connector) Cabled Backplane (two connectors) LR PAM4 <50 cm 25 db 35 db * OBO connector + CDR package assumed having 2 db loss ** C2M host packaged assumed 5 db loss and the CDR packaged assumed to reuse 2 db HCB loss. 12 Defined in both OIF/IEEE

A possible path forward is to optimize the 2 m Cu DAC for Switch to Server Proposed host loss in Ghiasi_100GEL_01_0318 is 10.5 db vs 8 db in lim_100gel_01_0318 Given the primary applicaeon of 2 m Cu DAC is switch to server Limit NIC PCB loss to 4 db, allocate +2.5 db to switch PCB, use 1.5 db excess budget for more robust 2 m Cu With 28.5 db ball to ball budget one could support 4-5 db loss for switch package loss and with 2-3 db for NIC 10.5 db 2.5 db 10.5 db 4.0 db 15 db 9.5 db 2.5 db 16.4 + 10.5 + 4 (2x1.2) = 28.5 db Ball-Ball Loss lim_100gel_01_0318 5.7 db 13 13

Summary The primary applica-ons that will benefit from 112G are the high capacity routers delivering capacity needed for 5G networks and high radix switches enabling next genera-on hyper scale data centers Managing power and cost will be key challenge for these type of systems What is necessary to enable these next genera-on system based on 112G/lane electrical IO are C2M with at least 200 mm PCB (15 db) support C2C with at least 400 mm PCB (20 db ball-ball) Reuse of RS (544, 514) for C2M and C2C interfaces Backplane applica-ons based on 0.5 m conven-onal PCB or 1 m cabled backplane with 35 db loss should also be considered as long as does not delay the C2M and C2C development For backplane applicagon should consider both RS (544, 514) as well as stronger FEC Cu cable since introduc-on of SFP+ CU DAC has been a huge success, but introduc-on of switches and QSFP- dd/osfp suppor-ng 256 lanes has diminished value of Cu DAC for TOR-Servers applica-ons Cu cable should be considered in this project as long it does not scarifies C2M PCB reach How to move forward not sacrificing C2M PCB reach and support 2 m Cu cable objec-ve: Define opgcal MDI based on 15 db loss and Cu MDI with 10 db, a port with 10 db loss can support Cu and opgcs Given the primary applicagon of 2 m DAC is switch to server an asymmetrical link budget as shown can support high density TOR as well as NIC without need to create superset ports. 14