FPGA Implementation of High Performance LDPC Decoder using Modified 2-bit Min-Sum Algorithm

Second International Conference on Coputer Research and Developent FPGA Ipleentation of High Perforance LDPC Decoder using Modified 2-bit Min-Su Algorith Vikra Arkalgud Chandrasetty and Syed Mahfuzul Aziz School of Electrical and Inforation Engineering University of South Australia Mawson Lakes, SA 5095, Australia vikraac@ieee.org, ahfuz.aziz@unisa.edu.au Abstract In this paper, a reduced coplexity Low-Density Parity-Check (LDPC) decoder is designed and ipleented on FPGA using a odified 2-bit Min-Su algorith. Siulation results reveal that the proposed decoder has iproveent of 1.5 db Eb/No at 10-5 bit error rate (BER) and requires fewer decoding iterations copared to original 2-bit Min-Su algorith. With a coparable BER perforance to that of 3- bit Min-Su algorith, the decoder ipleented using odified 2-bit Min-Su algorith saves about 18% of FPGA slices and can achieve an average throughput of 10.2 Gbps at db Eb/No. Keywords- digital counication; error correction coding; iterative decoding; field prograable gate array; logic design I. INTRODUCTION Low-Density Parity-Check (LDPC) [1] codes have becoe one of the ost attractive error correction codes due to its excellent perforance [2] and suitability in high data rate applications, such as WiMax, DVB-S2 and so on [3]. The inherent structure of the LDPC code akes the decoder achieve high degree of parallelis in practical ipleentation []. LDPC decoding algoriths are priarily iterative and are based on belief propagation essage passing algorith. The coplexity of the decoding algorith is highly critical for the overall perforance of the LDPC decoder. Various algoriths have been proposed in the past to achieve tradeoff between coplexity and perforance [5, 6]. The Su-Product Algorith (SPA) [7], a soft decision based essage passing algorith can achieve best perforance, but with high decoding coplexity. Whereas, Bit-Flip is a hard decision based algorith with least decoding coplexity, but suffers fro poor perforance [6]. Min-Su Algorith (MSA) is the siplified version of SPA that has reduced ipleentation coplexity with a slight degradation in perforance [7]. The MSA perfors siple arithetic and logical operations that akes suitable for hardware ipleentation. But the perforance of the algorith is significantly ipacted by the quantization of soft input essages used [8]. Reducing the quantization of the essage is invariably iportant to reduce the ipleentation coplexity and hardware resources of the decoder. But this advantage coes with degradation in decoding perforance. Perforance issues and hardware ipleentation of such low coplexity algoriths, especially the 2-bit MSA has liited inforation in the literature. This paper discusses the perforance and hardware ipleentation coplexity associated with 2-bit MSA. Modifications are proposed to iprove the overall perforance of the algorith to achieve coparable to that of 3-bit MSA. Siulation results reveal that the proposed Modified 2-bit Min-Su (MMS2) algorith achieves significant iproveent in decoding perforance, such as bit error rate (BER) and average decoding iterations copared to 2-bit MSA. With a coparable BER perforance to that of 3-bit MSA, FPGA ipleentation of proposed MMS2 can save up-to 18% of slices and leading to 23% iproveent in axiu operating frequency of the LDPC decoder. II. PROPOSED MODIFIED 2-BIT MIN-SUM ALGORITHM Although the siplified check node operation in MSA has reduced coplexity copared to SPA, the forer still requires high precision essages to be exchanged between the decoding nodes in the decoder. This is iportant to achieve coparable decoding perforance to that of SPA, with least perforance degradation. The level of quantization used in the soft channel essages represented as Log-Likelihood Ratios (LLR) and extrinsic essages of MSA directly ipacts the decoding perforance. As the quantization length of the essage decreases, the perforance and coplexity of the algorith reduces. Studies have shown that there is slight perforance loss in going fro 5bit to bit or even 3bit [8]. Using 2-bit quantized essages in MSA leads to assive reduction in ipleentation coplexity but suffers fro significant loss in decoder perforance copared to 3bit MSA. The perforance of 2-bit MSA has been iproved through optiization reported in [9]. The perforance is further iproved by the Modified 2-bit Min-Su (MMS2) algorith proposed in this paper. The check node and variable node operations of MMS2 algorith is described as follows: A. Variable Node Operation The variable node operation is siilar to that of the original Min-Su algorith [7]. The difference in the proposed algorith is that the variable node (Vi) perfors 978-0-7695-03-6/10 $26.00 2010 IEEE DOI 10.1109/ICCRD.2010.186 881

higher precision quantized LLR operations (LLRn), but aps the coputed result to 2-bit essage to be passed to the check nodes, as in (1). The 2-bit essage consists of a sign bit and a agnitude bit representing the coputed LLR su. The apping is based on a threshold (T) obtained fro siulations. Depending on the essage received fro the check nodes (Cj), the 2-bit inforation is again apped to constant values (±W or ±w) to perfor the LLR su operation in the variable node. These constant values for apping are also obtained fro siulations. The functions for apping the 2-bit essages are shown in (2) and (3). V i g LLRn f ( Cj) (1) ji where, n = 1, 2,.N (variable nodes) i = j = 1, 2,.dv (degree of variable node n ) 01 00 g( y) 10 11 W w f ( x) w W if y T 0 y T 0 x T x T x 01 x 00 x 10 x 11 where, T is the optiized threshold for apping obtained fro siulations; W is the optiized higher integer constant obtained fro siulations; w is the optiized lower integer constant obtained fro siulations. Monte Carlo siulations are carried out to obtain T, W and w values that provide best decoding perforance. B. Check Node Operation In MSA, the check node is expected to deterine the product of the sign of incoing essages and also find the iniu of the agnitude of the input essages [7]. In the proposed MMS2, the product of the sign of incoing essages are coputed by using XOR operation (Sk) and the inius are deterined using AND operation (Mk). The check node output essage (Ck) is obtained siply by concatenating the sign bit and the agnitude bit, as in (6). The essage passing between the nodes continues till the parity check is satisfied or axiu iteration is reached. (2) (3) Sk V1 V2... Vl l k () ( ) ( ) ( ) Mk V1 & V2 &... & Vl l k (5) Ck S M } (6) { k k where, l = k = 1,2,.d c (degree of check node) S = Sign bit of check node essage M = Magnitude bit of check node essage Vl(s)= Sign bit of the essage l fro variable node Vl()=Magnitude bit of the essage l fro variable node The essage apping in the variable node described above is siilar to that presented in [9]. However, the proposed MMS2 algorith eliinates the overhead of using scaling factor used in [9], uses higher precision LLR for variable node operation and incorporates siple logic for check node operation. These odifications lead to further iproveent in perforance and yet retain the reduced coplexity of routing only 2-bit essages between the variable and check nodes in the LDPC decoder. III. PERFORMANCE ANALYSIS The perforance of the proposed MMS2 algorith has been evaluated by developing a software odel using C progras in the MatLab environent. The LDPC codes were generated using Progressive Edge Growth (PEG) algorith [10]. Siulations were carried out assuing that the code words were odulated using Binary Phase Shift Keying (BPSK) and passed over an Additive White Gaussian Noise (AWGN) channel [11]. In [12], a ½ rate (3, 6) regular 1200-bit LDPC code with a axiu decoding iteration of 10 was used for FPGA ipleentation of 3-bit MSA. This specification has been used for siulation and coparison of the proposed MMS2 algorith. The corresponding FPGA ipleentation results are copared in section IV (A). The LLR quantization used for MMS2 is -bit. In the variable node, for -bit to 2-bit apping a threshold (T) of 2 is used and for 2-bit to -bit apping the weights used are W=3 and w=1. The BER perforance of MMS2 copared to original 2- bit and 3bit MSA is shown in Fig. 1. It can be noted that the MMS2 achieves a gain of 1.5 db at 10-5 BER over 2-bit MSA and suffers a loss of about 0.3 db at 10-5 BER over 3- bit MSA. A significant iproveent of average decoding iterations for MMS2 copared to 2-bit MSA can be observed in Fig. 2. IV. FPGA IMPLEMENTATION A fully parallel LDPC decoder architecture was designed for the proposed MMS2 algorith. The paraeterized hardware odel was developed using Verilog Hardware Description Language (HDL) and synthesized using Xilinx synthesis tool. The behavioral and post synthesis siulations were carried out using ModelSi. The block diagra of the designed LDPC decoder is shown in Fig. 3. The decoder consists of a global Clock and synchronous Reset inputs. The axiu perissible nuber of iterations is deterined by the value supplied at the MaxIter input. This can be set at a value in the range 0-15. When the Configure input is high, the MaxIter value is read. The LLRs are fed into the decoder using the Load control signal. The decoding process is initiated by the Start signal. After the decoding is copleted, the Decoded 882

Data can be obtained when indicated by the DataOut Ready signal. The receipt of data can be acknowledged on DataOut Ack to receive the next decoded bit. The nuber of iterations used for decoding can be obtained fro Used Iter port. The Decoder Status port indicates the progress (Active/Idle) of the decoder. LLR Input Load Start MaxIter Clock Reset Configure LDPC Decoder Decoder Status Decoded Data Used Iter DataOut Ready DataOut Ack Figure 3. Block diagra of the designed LDPC decoder Figure 1. BER perforance of MMS2 copared to MSA Note that the LLRs are loaded serially one at a tie to the decoder. Siilarly, the Decoded Data is latched bit by bit serially. This technique is used because of the liited nuber of Input/Output ports available in the FPGA. It also provides flexibility for ipleenting LDPC decoders with variable codelength without odifying the port configuration. A. Coparative Analysis A parallel architecture for a 1200-bit LDPC decoder, as described in section III, has been designed, synthesized, placed and routed for Xilinx Virtex (XCVLX200) FPGA. The axiu operating clock frequency achievable for the decoder is 123 MHz. The throughput of the decoder is calculated based on the forula presented in [12]. This calculation excludes the serial load tie of individual LLRs (before starting the decoding process) and latch tie of decoded data (after decoding is coplete). At an average decoding iteration of 7.2 at db Eb/No (see Fig. 2) the proposed decoder can achieve an average throughput of 10.2 Gbps. A coparison of the proposed decoder to that presented in [12] is shown in Table I. TABLE I. TABLE I. COMPARISON OF FULLY PARALLEL LDPC DECODERS In [12] Proposed Iproveent LDPC Code ½ rate (3,6) regular 1200-bit - Algorith 3-bit Min-Su MMS2 - BER 10-5 at 3.6 db 10-5 at 3.9 db 0.3 db FPGA Xilinx Virtex (xcvlx200) - Slices 0,613 33, 35 18% Figure 2. Average decoding iterations for MMS2 and MSA LUTs 69,038 58,053 16% Registers 18,95 15,691 17% Clock 100 MHz 123 MHz 23% Throughput Not Available 6 Gbps (Min) at 10 iterations 10.2 Gbps (Avg.) at db 7. Gbps (Min) at 10 iterations Results Synthesized, Placed and Routed - - 23% 883

B. Ipleentation Results The 1200-bit LDPC decoder presented above was not ipleented on the FPGA, as Xilinx Vertex was not available. However, a saller version of the decoder has been ipleented using Xilinx Virtex 5 FPGA developent board. A ½ rate (3, 6) regular 68-bit LDPC code that coplies with WLAN standard [13] was chosen for ipleentation. A coprehensive testing environent was developed using RS232 serial counication [1] to test the decoder on the FPGA. The setup used to test the LDPC decoder is shown in Fig.. An RS232 transceiver odule was ebedded on the FPGA along with the LDPC decoder odule to interface with the RS232 port. MatLab was used to counicate with the FPGA using the serial port. LLRs were generated and sent to FPGA with appropriate control signals for decoding. The decoded data received via the sae serial port was used to analyze the perforance of the decoder. The BER perforance and average iterations required by the decoder ipleented on FPGA copared to the software odel is shown in Fig. 5 and Fig. 6 respectively. The suary of FPGA ipleentation results of the LDPC decoder, including the RS232 serial counication odule is shown in Table II. At a axiu operating frequency of 113 MHz, the LDPC decoder ipleented can achieve an average throughput of 5. Gbps with an average iteration of 6.8 at.25 db Eb/No. Figure 5. BER perforance of LDPC decoder fro FPGA TABLE II. TABLE II. SUMMARY OF FPGA IMPLEMENTATION RESULTS Resources LDPC Decoder Slices 7,755 LUTs 22,01 Registers 8,555 Clock FPGA 113 MHz Xilinx Virtex 5 (XC5VLX110T-3FF1136) Figure 6. Average decoding iterations of LDPC decoder fro FPGA MatLab Personal Coputer Serial Port Connection RS232 Rx/Tx FPGA LDPC Decoder Figure. Block diagra of FPGA test setup for LDPC decoder V. CONCLUSION In this paper, a odified 2-bit Min-Su algorith is proposed to reduce the ipleentation coplexity of LDPC decoders. It is shown that with a slight degradation in perforance of about 0.3 db at a BER of 10-5 copared to 3-bit Min-Su, the proposed decoder leads to significant saving in hardware resource utilization and treendous increase in average throughput. The perforance of the proposed algorith and its feasibility for practical systes are also verified by ipleenting the decoder suitable for WLAN. Therefore, the proposed LDPC decoder is a highly attractive solution for applications requiring high perforance. 88

ACKNOWLEDGMENT The authors wish to acknowledge Dr Mark Ho of the School of Electrical and Inforation Engineering, University of South Australia, for his advice on carrying out the perforance siulations. REFERENCES [1] [1] R. Gallager, Low-density parity-check codes. IRE Transactions on Inforation Theory, 1962. 8(1): p. 21-28. [2] [2] D.J.C. MacKay and R.M. Neal, Near Shannon liit perforance of low density parity check codes. Electronics Letters, 1997. 33(6): p. 57-58. [3] [3] Tetsuo Nozawa (2005) LDPC Adopted for Use in Cos, Broadcasting, HDDs. Nikkei Electronics Asia. [] [] G.L.L. Nicolas Fau (2008) LDPC (Low Density Parity Check) - A Better Coding Schee for Wireless PHY Layers Design and Reuse Industry Article. [5] [5] S. Papaharalabos and P.T. Mathiopoulos, Siplified suproduct algorith for decoding LDPC codes with optial perforance. Electronics Letters, 2009. 5(2): p. 116-117. [6] [6] N. Miladinovic and M.P.C. Fossorier, Iproved bit-flipping decoding of low-density parity-check codes. IEEE Transactions on Inforation Theory, 2005. 51(): p. 159-1606. [7] [7] A. Anastasopoulos. A coparison between the su-product and the in-su iterative detection algoriths based on density evolution. in IEEE Global Telecounications Conference. 2001. [8] [8] R. Zarubica, et al. Efficient quantization schees for LDPC decoders. in IEEE Military Counications Conference. 2008. [9] [9] Z. Cui and Z. Wang, Iproved low-coplexity low-density parity-check decoding. IET Counications, 2008. 2(8): p. 1061-1068. [10] [10] X.-Y. Hu. Software to Construct PEG LDPC code. 2008 [cited 2009 May]; Available fro: http://www.inference.phy.ca.ac.uk/ackay/peg_ecc.htl. [11] [11] J.G. Proakis, Digital counications. 5th ed. ed, ed. M. Salehi. 2008, New York: McGraw-Hill. [12] [12] R. Zarubica, S.G. Wilson, and E. Hall. Multi-Gbps FPGA-Based Low Density Parity Check (LDPC) Decoder Design. in IEEE Global Telecounications Conference. 2007. [13] [13] IEEE 802.11n Wireless LAN Mediu Access Control MAC and Physical Layer PHY specifications. 2006, IEEE 802.11n-D1.0. [1] [1] RS232 Tutorial on Data Interface and Cables. 2009 [cited 2009 Sep]; Available fro: http://www.arcelect.co/rs232.ht. [15] [16] 885