1 WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor Mudge* *, Ann Arbor + Arizona State University, Tempe IISWC 13 Sep 24, 2013 1 1
Mobile Device Applications 2 2 2
Mobile Subscription Growth 3 7 World mobile-cellular subscriptions (Billions)* 6 5 4 3 2 1 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Year *ITU report -- The World in 2011: ICT Facts and Figures 3 3
Different Protocols 4 4 4
Wireless System 5 5 5
Wireless Benchmark! Requirement! Expose computation characteristics 6! Include a variety of kernels! Support different system configurations! Easy to use for computer architects 6 6
Drawbacks of Existing Works Incomplete list of kernels 7 Out-dated kernels Lack of kernel details Lack of a complete system Not free 7 7
Drawbacks of Existing Works Incomplete list of kernels 8 Out-dated kernels Lack of kernel details Lack of a complete system Not free Our solution -- WiBench 8 8
WiBench ( Y-Bench )! Open Source Kernel Suite for Wireless Systems! Covers the key signal processing kernels! 802.11a/WCDMA/LTE! Support different configurations in each kernel! BPSK/QPSK/16QAM/64QAM! 40 6144 Turbo codeword length! 2 a "3 b "5 c "7 d FFT/IFFT size! Implement the physical layer of an LTE uplink system! 1.56 100Mbps peak data rate! Channel model! Communication engineers evaluate the bit error rate performance 9 9 9
Benefits of WiBench A variety of state-of-art kernels 10 A complete LTE uplink system Configurability Enables hardware feature extraction Ease of use Open source and free 10 10
Outline! Motivation! Introduction of WiBench! Overview! Kernels! Application! Channel Model! Demonstration of WiBench Usage! Conclusion 11 11 11
Overview 12 Category Kernels Channel models Applications Benchmark Channel coding/decoding Rate matching Scrambling/Descrambling Constellation mapping/demapping MIMO detection FFT/IFFT Sub-carrier mapping/demapping Channel Estimation Gaussian Random Channel model (GRC) Extended Pedestrian A model (EPA) Extended Vehicular A model (EVA) Extended Typical Urban model (ETU) LTE uplink 12 12
Kernel Selection 13 Modulation Channel coding IFFT Constellation mapper Rate matcher Convolution encoder channel FFT Equalizer Constellation demapper Rate matcher Viterbi decoder 802.11a Channel estimation Modulation Channel coding Scrambler Spreader Constellation mapper Rate matcher Turbo encoder channel Descrambler Despreader Constellation demapper Rate matcher Turbo decoder WCDMA Channel estimation Modulation Channel coding IFFT MIMO Constellation mapper Scrambler Rate matcher Turbo encoder channel FFT MIMO detector Constellation demapper Descrambler Rate matcher Turbo decoder LTE Channel estimation 13 13
Kernel Selection 14 Modulation Channel coding IFFT Constellation mapper Rate matcher Convolution encoder channel FFT Equalizer Constellation demapper Rate matcher Viterbi decoder 802.11a Channel estimation Modulation Channel coding Scrambler Spreader Constellation mapper Rate matcher Turbo encoder channel Descrambler Despreader Constellation demapper Rate matcher Turbo decoder WCDMA Channel estimation Modulation Channel coding IFFT MIMO Constellation mapper Scrambler Rate matcher Turbo encoder channel FFT MIMO detector Constellation demapper Descrambler Rate matcher Turbo decoder LTE Channel estimation 14 14
Kernel Selection 15 Modulation Channel coding IFFT Constellation mapper Rate matcher Convolution encoder channel FFT Equalizer Constellation demapper Rate matcher Viterbi decoder 802.11a Channel estimation Modulation Channel coding Scrambler Spreader Constellation mapper Rate matcher Turbo encoder channel Descrambler Despreader Constellation demapper Rate matcher Turbo decoder WCDMA Channel estimation Modulation Channel coding IFFT MIMO Constellation mapper Scrambler Rate matcher Turbo encoder channel FFT MIMO detector Constellation demapper Descrambler Rate matcher Turbo decoder LTE Channel estimation 15 15
Kernel Channel Coding! Control errors in data transmission 16! Turbo code! Decoding algorithm! MAX-LOG-MAP algorithm! Iteratively compute the logarithmic likelihood of each bit! Configuration! 1/3 code rate! Codeword length: 40 6144 16 16
Kernel Scrambling! Encrypt and randomize data 17! Element-wise product 17 17
Kernel Constellation Mapping! Make the signals match the channel characteristics 18! Mapping/Demapping! Binary bits Complex value! Configuration! BPSK! QPSK! 16QAM! 64QAM 18 18
Kernel MIMO 19! Improve data transmission rate and quality! Detection algorithm! Least square based detection! Tree-based sphere detection! Configuration! Support 1x1, 2x2 and 4x4 configurations x 1 H 11 y 1 H 12 transmitter x 2 H 21 y 2 receiver H 22 19 19
Kernel FFT! Fast algorithm for DFT 20! Use FFTW library! C library to maximize performance! Configuration! 2 a "3 b "5 c "7 d FFT/IFFT size 20 20
Kernel Channel Estimation! Estimate the channel state information 21! Pilot-based channel estimation! Least Square! Less complexity! Minimum Mean Square Error! More accurate estimation 21 21
Application LTE Uplink System! Demonstration platform! System configurations support peak data rates from 1.56 to 100 Mbps 22 <2&9.' 6/#.>"& <&*/+5$--"&'('E+"&'612$;5"/-!*-"' @*-#A"&,#&*597"?./+-"77*-$./ @*;;"& <&*/+=.&5 F&"#.>"&,29:#*&&$"& @*;;"& <&*/+5$-'G*3"&'D <&*/+5$-'G*3"&'C,?:04@B @.>27*-$./ B/-"//*'D B/-"//*'C?A*//"7!"#"$%"&'(')*+"',-*-$./ <2&9.' 4"#.>"&!*-"'@*-#A"& 4"+#&*597"?./+-"77*-$./' 4"5*;;"& <&*/+=.&5' 4"#.>"& 0&"12"/#3' 4.5*$/' 612*7$8"&,29:#*&&$"&' 4"5*;;"&,?:04@B 4"5.>27*-$./,?:04@B 4"5.>27*-$./ B/-"//*'C B/-"//*'D?A*//"7' 6+-$5*-.& 22 22
Channel Models! Standard channel models! Gaussian Random Channel model! Extended Pedestrian A model! Extended Vehicular A model! Extended Typical Urban model 23! Users can evaluate the performance of different systems for different channel models! Bit error rate vs. Signal-to-Noise ratio 23 23
Outline! Motivation! Introduction of WiBench! Overview! Kernels! Application! Channel Model! Demonstration of WiBench Usage! Conclusion 24 24 24
Experimental Configurations 25 Feature Desktop platform (Intel i7) Mobile platform (Intel Atom) Frequency 3.40 GHz 1.60 GHz Out-of-order Yes No Single core issue width 4 2 I cache 32 KB 32 KB D cache 32 KB 24 KB L2 cache 256 KB 512 KB Main memory 16 GB DDR3 4 GB SDRAM 25 25
Breakdown of LTE Uplink Runtime (i7) 26! The configuration is for 100Mbps peak data rate 0%123(4456%1& '(758891:& ;<,<=/& >?"549@56%1& *,A</& M3K(#& I,<./& B0CDEFG& ;,*-/& H#('()%'91:& ;,I;/&!"#$%&'()%'(#& *+,-./& E(2)#57$491:&.,;-/& J53(&753)K91:&.,;I/& B"$C)5##9(#&'(758891:&.,LI/& 26 26
Vectorization Impact (i7) 27! Automatic vectorization by compiler D.7-*85E01*+$<4..=$F4$*+$5G$H*I.8$+*+?I.7-*85E01*+J$!"%#$!"(%$!"(#$!"'%$!"'#$!"&%$!"&#$!"!%$!"!#$!"#%$!"##$ )*+,-.//01*+$2.30445+6$ 2.,78039/5+6$ ):0++./$.,1301*+$ ;<$=.-.71*+$ >8..?90,.=$=.-.71*+$ @0-.$30-7:5+6$ AA>$ BAA>$ >C89*$=.7*=.8$ 27 27
Vectorization Impact (i7) 28! Automatic vectorization by compiler D.7-*85E01*+$<4..=$F4$*+$5G$H*I.8$+*+?I.7-*85E01*+J$!"%#$!"(%$!"(#$!"'%$!"'#$!"&%$!"&#$!"!%$!"!#$!"#%$!"##$ )*+,-.//01*+$2.30445+6$ 2.,78039/5+6$ ):0++./$.,1301*+$ ;<$=.-.71*+$ >8..?90,.=$=.-.71*+$ @0-.$30-7:5+6$ AA>$ BAA>$ >C89*$=.7*=.8$ 28 28
Vectorization Impact (i7) 29! Automatic vectorization by compiler D.7-*85E01*+$<4..=$F4$*+$5G$H*I.8$+*+?I.7-*85E01*+J$!"%#$!"(%$!"(#$!"'%$!"'#$!"&%$!"&#$!"!%$!"!#$!"#%$!"##$ )*+,-.//01*+$2.30445+6$ 2.,78039/5+6$ ):0++./$.,1301*+$ ;<$=.-.71*+$ >8..?90,.=$=.-.71*+$ @0-.$30-7:5+6$ AA>$ BAA>$ >C89*$=.7*=.8$ 29 29
Architectural Implications! A specific hardware accelerator for Turbo decoder 30! Efficiently execute constellation demapping and equalization! Include a wide SIMD engine to take advantage of DLP in the signal processing kernels 30 30
Inter-kernel Data Transfer 31! The configuration is for 100Mbps peak data rate 37.5KB Channel Estimator 37.5KB Ant SC-FDMA Demodulation 448KB Subcarrier Demapper 225KB Frequency Domain Equalizer 225KB Transform Decoder 450KB 450KB 450KB 225KB Turbo Decoder Rate Matcher Descramble Constellation Demapper 31 31
Different Configurations! Keep the peak data rate same! Change the system configurations 32 Configuration FFT IFFT MIMO Constellation Demapping A 256 150 2X2 16QAM B 512 300 1X1 16QAM C 512 300 2X2 QPSK D 1024 600 1X1 QPSK 32 32
Different Configurations (i7) 33 367869"7:8;<6=">?"@7>A6==B8C"D8"EFG":@9B8H"=:I?7D<6" DJ""%&#$".I@=">8"BK"B8"<="L6,A6@J"F:7I>"M6A>M67N" '" &#$" &" %#$" %"!#$"!" ()&$*+&,&+%*-(." /)$%&+%,%+%*-(." 0)$%&+&,&+-123" 4)%!&5+%,%+-123" 0>8=J699D;>8"M6<D@@B8C" GO:D9BPD;>8" 20QR4.(" 176M6A>MB8C" 46=A7D<I9B8C" SDJ6"<DJATB8C" 2:IQAD77B67"M6<D@@B8C" 33 33
Different Configurations (i7) 34 367869"7:8;<6=">?"@7>A6==B8C"D8"EFG":@9B8H"=:I?7D<6" DJ""%&#$".I@=">8"BK"B8"<="L6,A6@J"F:7I>"M6A>M67N" '" &#$" &" %#$" %"!#$"!" ()&$*+&,&+%*-(." /)$%&+%,%+%*-(." 0)$%&+&,&+-123" 4)%!&5+%,%+-123" 0>8=J699D;>8"M6<D@@B8C" GO:D9BPD;>8" 20QR4.(" 176M6A>MB8C" 46=A7D<I9B8C" SDJ6"<DJATB8C" 2:IQAD77B67"M6<D@@B8C" 34 34
Different Configurations (i7) 35 367869"7:8;<6=">?"@7>A6==B8C"D8"EFG":@9B8H"=:I?7D<6" DJ""%&#$".I@=">8"BK"B8"<="L6,A6@J"F:7I>"M6A>M67N" '" &#$" &" %#$" %"!#$"!" ()&$*+&,&+%*-(." /)$%&+%,%+%*-(." 0)$%&+&,&+-123" 4)%!&5+%,%+-123" 0>8=J699D;>8"M6<D@@B8C" GO:D9BPD;>8" 20QR4.(" 176M6A>MB8C" 46=A7D<I9B8C" SDJ6"<DJATB8C" 2:IQAD77B67"M6<D@@B8C" 35 35
Outline! Motivation! Introduction of WiBench! Overview! Kernels! Application! Channel Model! Demonstration of WiBench Usage! Conclusion 36 36 36
Conclusion! We designed an open source configurable wireless signal processing kernel suite 37! We included an LTE uplink in the benchmark that illustrates how to build a wireless application by assembling kernels! WiBench provides several standard channel models! We provided a demonstration of WiBench usage for hardware design 37 37
38 Thanks! Any questions? http://wibench.eecs.umich.edu 38 38
39 Backup 39 39
Vectorization Impact (Atom) 40 H2;1.<9I45./$@822A$J8$./$K1.7$L.M2<$/./C M2;1.<9I45./N$!",#$!"+#$!"*#$!")#$!"(#$!"'#$!"&#$!"%#$!"!#$!"##$ -./0123345./$6274889/:$ 620;<47=39/:$ ->4//23$205745./$?@$A212;5./$ B<22C=402A$A212;5./$ D412$741;>9/:$ EEB$ FEEB$ BG<=.$A2;.A2<$ 40 40
LTE uplink runtime breakdown (Atom) 41 0%123(4456%1& '(758891:& ;<,=-/& >?"549@56%1& A,A;/& L3K(#&.,<A/& B0CDEFG& ;,H+/& I#('()%'91:& ;,.+/&!"#$%&'()%'(#& *+,-./& E(2)#57$491:&=,+</& J53(&753)K91:&=,;H/& B"$)5##9(#&'(758891:& =,-./& 41 41
Different configurations (Atom) 42 145647"5869:4;"<=">5<?4;;@6A"<64"BCD"8>7@6E";8F=5G:4" GH""$%I#",F>;"<6"&H<:"@6":;"J4*?4>H"C85F<"K4?<K45L" %#" %!" $#" $!" #"!" &'%#()%*%)$(+&," -'#$%)$*$)$(+&,".'#$%)%*%)+/01" 2'$!%3)$*$)+/01".<6;H477G9<6"K4:G>>@6A" DM8G7@NG9<6" 0.OP2,&" /54K4?<K@6A" 24;?5G:F7@6A" QGH4":GH?R@6A" 08FO?G55@45"K4:G>>@6A" 42 42
Bit Error Rate 43 10 0 LTE Uplink System BER 10 1 10 2 Bit Error Rate 10 3 10 4 10 5 Perfect CSI FD LS 10 6 0 5 10 15 20 25 30 35 40 Eb/No in db 43 43
Related Work! MiBench! Only contains a small portion of the key kernels in a wireless signal processing system 44! LTE Uplink Receiver PHY Benchmark! Aim to simulate the workload change in an LTE base station! Miss the details of several important kernels! Turbo decoder is represented simply as a sleep function! BDTI OFDM receiver benchmark! Not free 44 44
Related Work! GNU Radio! A free and open source software toolkit providing signal processing blocks for software radio implementation.! Different goals! GNU Radio designed to implement software defined radio on commodity hardware.! WiBench -- Guide hardware design of domain specific solutions.! Building blocks! GNU Radio currently not provide kernels! WiBench well-built key kernels of mainstream wireless protocols as well as a complete LTE uplink system! WiBench is more efficient sometimes! Turbo decoder! WiBench run over 20% less instructions than GNU Radio 45 45 45