FPGA Implementation of High Performance LDPC Decoder using Modified 2-bit Min-Sum Algorithm

Similar documents
Estimating PSNR in High Definition H.264/AVC Video Sequences Using Artificial Neural Networks

An Industrial Case Study for X-Canceling MISR

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Viterbi Decoder User Guide

AbhijeetKhandale. H R Bhagyalakshmi

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL

Guidance For Scrambling Data Signals For EMC Compliance

COM-7003SOFT Turbo code encoder/decoder VHDL source code overview / IP core

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

Authentic Time Hardware Co-simulation of Edge Discovery for Video Processing System

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

LUT Optimization for Memory Based Computation using Modified OMS Technique

Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder

The Design of Efficient Viterbi Decoder and Realization by FPGA

POLAR codes are gathering a lot of attention lately. They

Commsonic. Satellite FEC Decoder CMS0077. Contact information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA Implementation of DA Algritm for Fir Filter

An Efficient Reduction of Area in Multistandard Transform Core

Design of an Error Output Feedback Digital Delta Sigma Modulator with In Stage Dithering for Spur Free Output Spectrum

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Why FPGAs? FPGA Overview. Why FPGAs?

Implementation of CRC and Viterbi algorithm on FPGA

Design for Verication at the Register Transfer Level. Krishna Sekar. Department of ECE. La Jolla, CA RTL Testbench

Registers and Counters

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

Polar Decoder PD-MS 1.1

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Investigation on Technical Feasibility of Stronger RS FEC for 400GbE

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION

SDR Implementation of Convolutional Encoder and Viterbi Decoder

A Robust Turbo Codec Design for Satellite Communications

FPGA Hardware Resource Specific Optimal Design for FIR Filters

From Theory to Practice: Private Circuit and Its Ambush

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

REDUCED-COMPLEXITY DECODING FOR CONCATENATED CODES BASED ON RECTANGULAR PARITY-CHECK CODES AND TURBO CODES

Research Article Low Power 256-bit Modified Carry Select Adder

Distributed Arithmetic Unit Design for Fir Filter

ISSN:

Hardware Implementation of Viterbi Decoder for Wireless Applications

Implementation of Low Power and Area Efficient Carry Select Adder

An MFA Binary Counter for Low Power Application

VA08V Multi State Viterbi Decoder. Small World Communications. VA08V Features. Introduction. Signal Descriptions

DVB-S2X for Next Generation C4ISR Applications

THE USE OF forward error correction (FEC) in optical networks

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Design of Memory Based Implementation Using LUT Multiplier

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

GMM-based Synchronization rules for HMM-based Audio-Visual laughter synthesis

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

FPGA IMPLEMENTATION AN ALGORITHM TO ESTIMATE THE PROXIMITY OF A MOVING TARGET

VHDL IMPLEMENTATION OF TURBO ENCODER AND DECODER USING LOG-MAP BASED ITERATIVE DECODING

Logic Design II (17.342) Spring Lecture Outline

(51) Int Cl.: H04L 1/00 ( )

FPGA Implementaion of Soft Decision Viterbi Decoder

I. INTRODUCTION II. LOW-POWER PARALLEL DECODERS

Fast Polar Decoders: Algorithm and Implementation

L11/12: Reconfigurable Logic Architectures

Low-Floor Decoders for LDPC Codes

On the design of turbo codes with convolutional interleavers

FPGA Realization of Farrow Structure for Sampling Rate Change

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

ALONG with the progressive device scaling, semiconductor

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

L12: Reconfigurable Logic Architectures

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Adaptive decoding of convolutional codes

An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Design of BIST with Low Power Test Pattern Generator

Midterm Exam 15 points total. March 28, 2011

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Benefits of a Small Diameter Category 6A Copper Cabling System

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL

Benefits of a Small Diameter Category 6A Copper Cabling System

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

COMPUTATIONAL REDUCTION LOGIC FOR ADDERS

An Efficient High Speed Wallace Tree Multiplier

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

Memory efficient Distributed architecture LUT Design using Unified Architecture

Performance Analysis of Convolutional Encoder and Viterbi Decoder Using FPGA

An Adaptive Reed-Solomon Errors-and-Erasures Decoder

Modeling Latches and Flip-flops

FPGA Implementation OF Reed Solomon Encoder and Decoder

Implementation of a turbo codes test bed in the Simulink environment

Design & Simulation of 128x Interpolator Filter

Transcription:

Second International Conference on Coputer Research and Developent FPGA Ipleentation of High Perforance LDPC Decoder using Modified 2-bit Min-Su Algorith Vikra Arkalgud Chandrasetty and Syed Mahfuzul Aziz School of Electrical and Inforation Engineering University of South Australia Mawson Lakes, SA 5095, Australia vikraac@ieee.org, ahfuz.aziz@unisa.edu.au Abstract In this paper, a reduced coplexity Low-Density Parity-Check (LDPC) decoder is designed and ipleented on FPGA using a odified 2-bit Min-Su algorith. Siulation results reveal that the proposed decoder has iproveent of 1.5 db Eb/No at 10-5 bit error rate (BER) and requires fewer decoding iterations copared to original 2-bit Min-Su algorith. With a coparable BER perforance to that of 3- bit Min-Su algorith, the decoder ipleented using odified 2-bit Min-Su algorith saves about 18% of FPGA slices and can achieve an average throughput of 10.2 Gbps at db Eb/No. Keywords- digital counication; error correction coding; iterative decoding; field prograable gate array; logic design I. INTRODUCTION Low-Density Parity-Check (LDPC) [1] codes have becoe one of the ost attractive error correction codes due to its excellent perforance [2] and suitability in high data rate applications, such as WiMax, DVB-S2 and so on [3]. The inherent structure of the LDPC code akes the decoder achieve high degree of parallelis in practical ipleentation []. LDPC decoding algoriths are priarily iterative and are based on belief propagation essage passing algorith. The coplexity of the decoding algorith is highly critical for the overall perforance of the LDPC decoder. Various algoriths have been proposed in the past to achieve tradeoff between coplexity and perforance [5, 6]. The Su-Product Algorith (SPA) [7], a soft decision based essage passing algorith can achieve best perforance, but with high decoding coplexity. Whereas, Bit-Flip is a hard decision based algorith with least decoding coplexity, but suffers fro poor perforance [6]. Min-Su Algorith (MSA) is the siplified version of SPA that has reduced ipleentation coplexity with a slight degradation in perforance [7]. The MSA perfors siple arithetic and logical operations that akes suitable for hardware ipleentation. But the perforance of the algorith is significantly ipacted by the quantization of soft input essages used [8]. Reducing the quantization of the essage is invariably iportant to reduce the ipleentation coplexity and hardware resources of the decoder. But this advantage coes with degradation in decoding perforance. Perforance issues and hardware ipleentation of such low coplexity algoriths, especially the 2-bit MSA has liited inforation in the literature. This paper discusses the perforance and hardware ipleentation coplexity associated with 2-bit MSA. Modifications are proposed to iprove the overall perforance of the algorith to achieve coparable to that of 3-bit MSA. Siulation results reveal that the proposed Modified 2-bit Min-Su (MMS2) algorith achieves significant iproveent in decoding perforance, such as bit error rate (BER) and average decoding iterations copared to 2-bit MSA. With a coparable BER perforance to that of 3-bit MSA, FPGA ipleentation of proposed MMS2 can save up-to 18% of slices and leading to 23% iproveent in axiu operating frequency of the LDPC decoder. II. PROPOSED MODIFIED 2-BIT MIN-SUM ALGORITHM Although the siplified check node operation in MSA has reduced coplexity copared to SPA, the forer still requires high precision essages to be exchanged between the decoding nodes in the decoder. This is iportant to achieve coparable decoding perforance to that of SPA, with least perforance degradation. The level of quantization used in the soft channel essages represented as Log-Likelihood Ratios (LLR) and extrinsic essages of MSA directly ipacts the decoding perforance. As the quantization length of the essage decreases, the perforance and coplexity of the algorith reduces. Studies have shown that there is slight perforance loss in going fro 5bit to bit or even 3bit [8]. Using 2-bit quantized essages in MSA leads to assive reduction in ipleentation coplexity but suffers fro significant loss in decoder perforance copared to 3bit MSA. The perforance of 2-bit MSA has been iproved through optiization reported in [9]. The perforance is further iproved by the Modified 2-bit Min-Su (MMS2) algorith proposed in this paper. The check node and variable node operations of MMS2 algorith is described as follows: A. Variable Node Operation The variable node operation is siilar to that of the original Min-Su algorith [7]. The difference in the proposed algorith is that the variable node (Vi) perfors 978-0-7695-03-6/10 $26.00 2010 IEEE DOI 10.1109/ICCRD.2010.186 881

higher precision quantized LLR operations (LLRn), but aps the coputed result to 2-bit essage to be passed to the check nodes, as in (1). The 2-bit essage consists of a sign bit and a agnitude bit representing the coputed LLR su. The apping is based on a threshold (T) obtained fro siulations. Depending on the essage received fro the check nodes (Cj), the 2-bit inforation is again apped to constant values (±W or ±w) to perfor the LLR su operation in the variable node. These constant values for apping are also obtained fro siulations. The functions for apping the 2-bit essages are shown in (2) and (3). V i g LLRn f ( Cj) (1) ji where, n = 1, 2,.N (variable nodes) i = j = 1, 2,.dv (degree of variable node n ) 01 00 g( y) 10 11 W w f ( x) w W if y T 0 y T 0 x T x T x 01 x 00 x 10 x 11 where, T is the optiized threshold for apping obtained fro siulations; W is the optiized higher integer constant obtained fro siulations; w is the optiized lower integer constant obtained fro siulations. Monte Carlo siulations are carried out to obtain T, W and w values that provide best decoding perforance. B. Check Node Operation In MSA, the check node is expected to deterine the product of the sign of incoing essages and also find the iniu of the agnitude of the input essages [7]. In the proposed MMS2, the product of the sign of incoing essages are coputed by using XOR operation (Sk) and the inius are deterined using AND operation (Mk). The check node output essage (Ck) is obtained siply by concatenating the sign bit and the agnitude bit, as in (6). The essage passing between the nodes continues till the parity check is satisfied or axiu iteration is reached. (2) (3) Sk V1 V2... Vl l k () ( ) ( ) ( ) Mk V1 & V2 &... & Vl l k (5) Ck S M } (6) { k k where, l = k = 1,2,.d c (degree of check node) S = Sign bit of check node essage M = Magnitude bit of check node essage Vl(s)= Sign bit of the essage l fro variable node Vl()=Magnitude bit of the essage l fro variable node The essage apping in the variable node described above is siilar to that presented in [9]. However, the proposed MMS2 algorith eliinates the overhead of using scaling factor used in [9], uses higher precision LLR for variable node operation and incorporates siple logic for check node operation. These odifications lead to further iproveent in perforance and yet retain the reduced coplexity of routing only 2-bit essages between the variable and check nodes in the LDPC decoder. III. PERFORMANCE ANALYSIS The perforance of the proposed MMS2 algorith has been evaluated by developing a software odel using C progras in the MatLab environent. The LDPC codes were generated using Progressive Edge Growth (PEG) algorith [10]. Siulations were carried out assuing that the code words were odulated using Binary Phase Shift Keying (BPSK) and passed over an Additive White Gaussian Noise (AWGN) channel [11]. In [12], a ½ rate (3, 6) regular 1200-bit LDPC code with a axiu decoding iteration of 10 was used for FPGA ipleentation of 3-bit MSA. This specification has been used for siulation and coparison of the proposed MMS2 algorith. The corresponding FPGA ipleentation results are copared in section IV (A). The LLR quantization used for MMS2 is -bit. In the variable node, for -bit to 2-bit apping a threshold (T) of 2 is used and for 2-bit to -bit apping the weights used are W=3 and w=1. The BER perforance of MMS2 copared to original 2- bit and 3bit MSA is shown in Fig. 1. It can be noted that the MMS2 achieves a gain of 1.5 db at 10-5 BER over 2-bit MSA and suffers a loss of about 0.3 db at 10-5 BER over 3- bit MSA. A significant iproveent of average decoding iterations for MMS2 copared to 2-bit MSA can be observed in Fig. 2. IV. FPGA IMPLEMENTATION A fully parallel LDPC decoder architecture was designed for the proposed MMS2 algorith. The paraeterized hardware odel was developed using Verilog Hardware Description Language (HDL) and synthesized using Xilinx synthesis tool. The behavioral and post synthesis siulations were carried out using ModelSi. The block diagra of the designed LDPC decoder is shown in Fig. 3. The decoder consists of a global Clock and synchronous Reset inputs. The axiu perissible nuber of iterations is deterined by the value supplied at the MaxIter input. This can be set at a value in the range 0-15. When the Configure input is high, the MaxIter value is read. The LLRs are fed into the decoder using the Load control signal. The decoding process is initiated by the Start signal. After the decoding is copleted, the Decoded 882

Data can be obtained when indicated by the DataOut Ready signal. The receipt of data can be acknowledged on DataOut Ack to receive the next decoded bit. The nuber of iterations used for decoding can be obtained fro Used Iter port. The Decoder Status port indicates the progress (Active/Idle) of the decoder. LLR Input Load Start MaxIter Clock Reset Configure LDPC Decoder Decoder Status Decoded Data Used Iter DataOut Ready DataOut Ack Figure 3. Block diagra of the designed LDPC decoder Figure 1. BER perforance of MMS2 copared to MSA Note that the LLRs are loaded serially one at a tie to the decoder. Siilarly, the Decoded Data is latched bit by bit serially. This technique is used because of the liited nuber of Input/Output ports available in the FPGA. It also provides flexibility for ipleenting LDPC decoders with variable codelength without odifying the port configuration. A. Coparative Analysis A parallel architecture for a 1200-bit LDPC decoder, as described in section III, has been designed, synthesized, placed and routed for Xilinx Virtex (XCVLX200) FPGA. The axiu operating clock frequency achievable for the decoder is 123 MHz. The throughput of the decoder is calculated based on the forula presented in [12]. This calculation excludes the serial load tie of individual LLRs (before starting the decoding process) and latch tie of decoded data (after decoding is coplete). At an average decoding iteration of 7.2 at db Eb/No (see Fig. 2) the proposed decoder can achieve an average throughput of 10.2 Gbps. A coparison of the proposed decoder to that presented in [12] is shown in Table I. TABLE I. TABLE I. COMPARISON OF FULLY PARALLEL LDPC DECODERS In [12] Proposed Iproveent LDPC Code ½ rate (3,6) regular 1200-bit - Algorith 3-bit Min-Su MMS2 - BER 10-5 at 3.6 db 10-5 at 3.9 db 0.3 db FPGA Xilinx Virtex (xcvlx200) - Slices 0,613 33, 35 18% Figure 2. Average decoding iterations for MMS2 and MSA LUTs 69,038 58,053 16% Registers 18,95 15,691 17% Clock 100 MHz 123 MHz 23% Throughput Not Available 6 Gbps (Min) at 10 iterations 10.2 Gbps (Avg.) at db 7. Gbps (Min) at 10 iterations Results Synthesized, Placed and Routed - - 23% 883

B. Ipleentation Results The 1200-bit LDPC decoder presented above was not ipleented on the FPGA, as Xilinx Vertex was not available. However, a saller version of the decoder has been ipleented using Xilinx Virtex 5 FPGA developent board. A ½ rate (3, 6) regular 68-bit LDPC code that coplies with WLAN standard [13] was chosen for ipleentation. A coprehensive testing environent was developed using RS232 serial counication [1] to test the decoder on the FPGA. The setup used to test the LDPC decoder is shown in Fig.. An RS232 transceiver odule was ebedded on the FPGA along with the LDPC decoder odule to interface with the RS232 port. MatLab was used to counicate with the FPGA using the serial port. LLRs were generated and sent to FPGA with appropriate control signals for decoding. The decoded data received via the sae serial port was used to analyze the perforance of the decoder. The BER perforance and average iterations required by the decoder ipleented on FPGA copared to the software odel is shown in Fig. 5 and Fig. 6 respectively. The suary of FPGA ipleentation results of the LDPC decoder, including the RS232 serial counication odule is shown in Table II. At a axiu operating frequency of 113 MHz, the LDPC decoder ipleented can achieve an average throughput of 5. Gbps with an average iteration of 6.8 at.25 db Eb/No. Figure 5. BER perforance of LDPC decoder fro FPGA TABLE II. TABLE II. SUMMARY OF FPGA IMPLEMENTATION RESULTS Resources LDPC Decoder Slices 7,755 LUTs 22,01 Registers 8,555 Clock FPGA 113 MHz Xilinx Virtex 5 (XC5VLX110T-3FF1136) Figure 6. Average decoding iterations of LDPC decoder fro FPGA MatLab Personal Coputer Serial Port Connection RS232 Rx/Tx FPGA LDPC Decoder Figure. Block diagra of FPGA test setup for LDPC decoder V. CONCLUSION In this paper, a odified 2-bit Min-Su algorith is proposed to reduce the ipleentation coplexity of LDPC decoders. It is shown that with a slight degradation in perforance of about 0.3 db at a BER of 10-5 copared to 3-bit Min-Su, the proposed decoder leads to significant saving in hardware resource utilization and treendous increase in average throughput. The perforance of the proposed algorith and its feasibility for practical systes are also verified by ipleenting the decoder suitable for WLAN. Therefore, the proposed LDPC decoder is a highly attractive solution for applications requiring high perforance. 88

ACKNOWLEDGMENT The authors wish to acknowledge Dr Mark Ho of the School of Electrical and Inforation Engineering, University of South Australia, for his advice on carrying out the perforance siulations. REFERENCES [1] [1] R. Gallager, Low-density parity-check codes. IRE Transactions on Inforation Theory, 1962. 8(1): p. 21-28. [2] [2] D.J.C. MacKay and R.M. Neal, Near Shannon liit perforance of low density parity check codes. Electronics Letters, 1997. 33(6): p. 57-58. [3] [3] Tetsuo Nozawa (2005) LDPC Adopted for Use in Cos, Broadcasting, HDDs. Nikkei Electronics Asia. [] [] G.L.L. Nicolas Fau (2008) LDPC (Low Density Parity Check) - A Better Coding Schee for Wireless PHY Layers Design and Reuse Industry Article. [5] [5] S. Papaharalabos and P.T. Mathiopoulos, Siplified suproduct algorith for decoding LDPC codes with optial perforance. Electronics Letters, 2009. 5(2): p. 116-117. [6] [6] N. Miladinovic and M.P.C. Fossorier, Iproved bit-flipping decoding of low-density parity-check codes. IEEE Transactions on Inforation Theory, 2005. 51(): p. 159-1606. [7] [7] A. Anastasopoulos. A coparison between the su-product and the in-su iterative detection algoriths based on density evolution. in IEEE Global Telecounications Conference. 2001. [8] [8] R. Zarubica, et al. Efficient quantization schees for LDPC decoders. in IEEE Military Counications Conference. 2008. [9] [9] Z. Cui and Z. Wang, Iproved low-coplexity low-density parity-check decoding. IET Counications, 2008. 2(8): p. 1061-1068. [10] [10] X.-Y. Hu. Software to Construct PEG LDPC code. 2008 [cited 2009 May]; Available fro: http://www.inference.phy.ca.ac.uk/ackay/peg_ecc.htl. [11] [11] J.G. Proakis, Digital counications. 5th ed. ed, ed. M. Salehi. 2008, New York: McGraw-Hill. [12] [12] R. Zarubica, S.G. Wilson, and E. Hall. Multi-Gbps FPGA-Based Low Density Parity Check (LDPC) Decoder Design. in IEEE Global Telecounications Conference. 2007. [13] [13] IEEE 802.11n Wireless LAN Mediu Access Control MAC and Physical Layer PHY specifications. 2006, IEEE 802.11n-D1.0. [1] [1] RS232 Tutorial on Data Interface and Cables. 2009 [cited 2009 Sep]; Available fro: http://www.arcelect.co/rs232.ht. [15] [16] 885