LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES Masum Hossain University of Alberta 0

Outline Why ADC-Based receiver? Challenges in ADC-based receiver ADC-DSP based Receiver Reducing impact of Quantization Noise Variable Resolution ADC low-latency high-resolution TDC-based timing recovery Implemented Prototype and Measured Results 1

Conventional mixed-signal Link Tx FIR Filter: Peak power constrained Limited by supply voltage Peaking equalizer: Analog - does not scale well Limited by supply voltage PVT variation Decision Feedback Eq.: Latency constrained Difficult for multilevel signaling Existing equalization strategy does not scale well with technology, channel loss and data rate 2

Mixed-signal vs ADC-based Link ADC-based high speed Link Analog mixed-signal Digital Benefits of DSP-based equalization: Scales well with technology Frequency response can be well controlled Can equalize both pre and post cursors Challenges of DSP-based equalization: ADC-DSP is power hungry. Higher loop latency make timing recovery difficult 3

PAM-4 Digital Receiver Architecture Variable Resolution Predictive ADC 8-tap Digital FFE 3-tap in Look-up table 5-tap in conventional way Timing Recovery 3-bit TDC 4

Variable Resolution ADC 12 db loss 1 4 Fixed Reference Normalized Step Response & Comparator Reference 0.8 0.6 0.4 0.2 Transient Data Edge 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (Bit period) Between two consecutive samples signal changes a lot Need to cover entire dynamic range - 4 Fixed References. 5

Variable Resolution ADC 25 db loss Normalized Step Response & Comparator Reference 1 0.8 0.6 0.4 0.2 Transient Data Edge Data Edge 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (Bit period) Between two consecutive samples signal changes around 20% - 30% Need to cover a portion of entire dynamic range Reference Switching 6

Variable Resolution ADC 25 db loss Normalized Step Response & Comparator Reference 1 0.8 0.6 0.4 0.2 Transient Data Edge Data Edge 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (Bit period) Edge comparator output defines the next probable location of references 7

Variable Resolution ADC 25 db loss 1 2 Edge Reference Normalized Step Response & Comparator Reference 0.8 0.6 0.4 0.2 Fine Reference Transient Data Edge Data Edge 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (Bit period) Fine references are carried over to the mid of two coarse references 8

Variable Resolution ADC Sample and Hold Coarse Fine ODD Quad Edge Octal ODD Quad Coarse Octal EVEN EDGE Fine EVEN 9 Quad Octal PGEN PGEN Quad and Octal clock is retimed with a the original quad clock /2 Matched delay 3.5 GHz

ADC Offset Correction Ref: [2] Unbalance the capacitive load attached to the input of the strong-arm latch Store the bit-decisions into a 6T SRAM to reduce the area. 10

Measured ADC Performance 11

PAM-4 Digital Receiver Architecture Variable Resolution Predictive ADC 8-tap Digital FFE 3-tap in Look-up table 5-tap in conventional way Timing Recovery 3-bit TDC 12

Timing Recovery Challenge for ADC-based Receiver Digital FFE Ф Q MM Phase Detector Ф N Digital Filter MM based phase detection is not as robust as 2x (i.e. data and edge) sampled CDR Bang-bang or 1 bit phase quantization at the Phase detector increases in-band jitter Lowering loop bandwidth increases VCO phase noise contribution Loop latency makes it difficult to achieve wider loop bandwidth 13

Effect of Timing Noise on SNR Effect of timing noise on SNR is less when we consider channel loss!!!

Phase Tracking vs Blind ADC based [Clifford et.al. JSSC, 2013] Simple But latency sensitive ADC benefits from jitter tracking Less latency sensitive ADC does not benefits from jitter tracking

Low-latency Timing Recovery Region 3 Region 2 Region 1 Region 0 16

Low-latency Timing Recovery SAR TDC operation Proposed CDR Advantages: ADC bypass significantly reduces latency 3b SAR TDC reduces bang-bang dithering by 4x. Wider loop BW effectively filters VCO phase noise 17

Jitter Tolerance (UIpp) CDR Performance Phase Noise Jitter Tolerance with 2 7-1 pattern Free-running Equipment limit Locked Integrated jitter = 0.5 ps In-band phase noise = - 90 dbc/hz Frequency (MHz) 10 2 18

PAM-4 Digital Receiver Architecture Variable Resolution Predictive ADC 8-tap Digital FFE 3-tap in Look-up table 5-tap in conventional way Timing Recovery 3-bit TDC 19

Noise Sources in ADC-based Receiver N LEQ N ADC N QZ Digital FFE Ф N Noise Source Constrain Transfer Gain N LEQ Power/Gain/BW LEQ + FFE Φ N Power and latency FFE N ADC Power/Settling time FFE N QZ ADC Resolution FFE Power (mw) 300 250 200 150 100 50 Timing Recovery Flash ADC, Fs=14GS/s 20 0 2 3 4 5 6 ADC Resolution (No. of bits)

Quantization Noise Impact N QZ, out N QZ W h X FFE h pre 2 h h X main N 2 Q Pr e h post W 2 Pr e N 2 QMain W 3 2 Main, x Pr e, Main, Post N 2 QPost W 2 Post 0-10 -20 FFT at the ADC Output (Simulated) FFT at the FFE Output (Simulated) ADC quantization Noise Floor (Theoretical) Quantization noise floor at the FFE output (Theoretical) AMPLITUDE (db) -30-40 -50-60 -70-80 -90-100 1 2 3 4 5 6 7 ANALOG INPUT FREQUENCY (GHz)

How to reduce ADC quantization noise impact? N QZ N bit N bit Z -1 N bit Z -1 h main h post h main h post N QMain N QPost N QZ 2N bit 2N bit Although Digital FFE output can be 2N bit, we are we are still limited by ADC s N bit resolution If FFE can be moved ahead of the ADC than we can Minimize ADC s quantization noise penalty How can we build a digital FFE with resolution better than the ADC? 22 22

Reducing Quantization Noise Impact LUT FFE Conv. FFE 5 bit 5 5 5 5 5 Address Decoder 9 9 LUT based first three taps reduces quantization noise impact 3 to 8 taps does not significantly amplify quantization noise 23

Reducing Quantization Noise Impact 8-tap Conventional Power for different no. of taps and tap resolution 3-tap LUT + 5-tap Conventional Power for different no. of taps and tap resolution - LT 150 150 100 100 50 50 10 8 6 Tap resolution 4 4 6 No. of taps 8 10 10 8 6 Tap resolution 4 4 6 No. of taps 8 10 Proposed approach is 30% lower power compared to conv. FIR implementation

500 µm 1000 µm Area Impact of the proposed solution 8-tap Conventional 500 µm 3-tap LUT + 5-tap Conventional 1300 µm Area increases by 4x but Standard cell SRAM will reduce is by 25% Area will scale significantly with technology

Implemented Prototype in 65nm CMOS Long Reach DSP 30 mw 40 mw Analog TDC 33 mw 29 mw Digital 28 mw Clk. Gen + Buffer Medium Reach DSP 26 mw Analog 35 mw TDC 24 mw 26 mw Digital 23 mw Clk. Gen + Buffer Digital: T-to-B, Mode selection Retimer High BW Amplifier Passive Equalizer P0 HR (Fine S/H) Reference Generator P0 (Coarse S/H) P315 (Edge S/H) 3.5 GHz Clock Gen Even Odd TDC Implemented in TSMC 65nm 26 T-to-B T-to-B T-to-B 2.5 2 1 2 1.5 T-to-B Mode Selection 5.5 5 4 3 2 CH0 CH90 CH180 CH270 Digital Interface DSP FPGA

Implemented Prototype in 65nm CMOS To FPGA Heavily digital solution Input needs only 7 GHz bandwidth 27

Experimental Setup Matched SMA cables PCB for testing FPGA Interface Input Clock Cyclone V FPGA Varying channel loss by cascading SMA cables. 28

Input EYE in Digital Domain frequency responses of LR, MR and SR channels S R Linear Equalizer output EYE ADC Code 31 31 20 10 Reconstructed digital EYE from ADC output 0-0.5 0 0.5 Time (UI) MR ADC Code 20 10 0-0.5-0.25 0 0.25 0.5 Time (UI) LR Tx has 6 db equalization Linear equalizer boost: 6 to 14 db 29

BER Occurrenc e Link Margin at 28Gb/s 30 db Channel 3-tap LUT + 5-tap 8-tap Conventional -3 Conventional -1 1 3-3 -1 1 3 Equalized output code Equalized output code FPGA gives the distribution of the bins The distribution is converted into log-scale Gaussian fit to extract the BER. 30

BER Power (mw) @ 28 Gb/s Link Margin Test and Energy Efficiency Data rate: 28 Gb/s PAM-4 4.6 pj/bit 5.7 pj/bit FFE 2.1 pj/bit 2.1 pj/bit 3.25 pj/bit TDC ADC Channel Loss (db) Receiver can achieve BER up to 10-9 31

Comparison with state-of-art Shafik ISSCC 2015[4] Frans VLSI 2016[5] Cui ISSCC 2016[3] 32 Rylov ISSCC 2016 [6] This Work Technology 65 nm CMOS 16 nm FinFET 28 nm CMOS 32 nm CMOS 65 nm CMOS Data Rate (Gb/s) ADC Architecture ENOB@ Nyquist Timing Recovery 10 NRZ 32x TI SAR ADC 56 PAM-4 32x TI SAR ADC 32 PAM-4 32x TI SAR ADC 25 NRZ 4x Flash ADC 28 PAM-4 4x Flash ADC 4.74 4.9 5.85 4 4.1 N/A Baud-rate Baud-rate Baud-rate Edge & Data Sampled Tracking BW --- --- --- --- 10+ MHz Jitter Tolerance Channel Loss Equalization Power (mw) --- ---- --- --- 0.2 UIpp @ 50 MHz 36.4 db @ 5 GHz 79(w/o DSP) 87(w DSP) 25 db @ 14 GHz 32 db @ 8 GHz 40 db @ 12 GHz 30 db @ 7 GHz 410(w/o DSP) 320 453 130@30 db w/o 45 @ 15 db DSP 160@30 db with 60 @ 15 db DSP FOM (pj/bit) 8.7 7.32 10 18.12 5.71@ 30 db with 2.14@ 15 db DSP

Summary of ADC Based Receiver ADC- DSP Based receivers are the future for multilevel signaling in advanced CMOS but it s power has to be reduced. DSP needs to be more information efficient Non-uniform quantization is a simple way to improve effective resolution. ADC for wireline is different than general purpose ADC. General purpose ADC considers each sample uncorrelated but in reality channel ISI makes them correlated predictive ADC is a simple way to take advantage of that. Timing recovery is as important as data recovery Multibit TDC and lower latency is an effective way to improve timing recovery loop and meet jitter requirement of the ADC. 33