Quantization of Three-Bit Logic for LDPC Decoding

Similar documents
Error Concealment Aware Rate Shaping for Wireless Video Transport 1

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Following a musical performance from a partially specified score.

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

Novel Quantization Strategies for Linear Prediction with Guarantees

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

Optimized PMU placement by combining topological approach and system dynamics aspects

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

A Comparative Analysis of Disk Scheduling Policies

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

Critical Path Reduction of Distributed Arithmetic Based FIR Filter

A Scalable HDD Video Recording Solution Using A Real-time File System

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

A Quantization-Friendly Separable Convolution for MobileNets

QUICK START GUIDE v0.98

Simple VBR Harmonic Broadcasting (SVHB)

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

Statistics AGAIN? Descriptives

A STUDY OF TRUMPET ENVELOPES

Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

Technical Information

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

Modeling Form for On-line Following of Musical Performances

Analysis of Subscription Demand for Pay-TV

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

Color Monitor. L200p. English. User s Guide

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

Anchor Box Optimization for Object Detection

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Why Take Notes? Use the Whiteboard Capture System

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

T541 Flat Panel Monitor User Guide ENGLISH

User s manual. Digital control relay SVA

current activity shows on the top right corner in green. The steps appear in yellow

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

Product Information. Manual change system HWS

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

Scalable QoS-Aware Disk-Scheduling

Product Information. Manual change system HWS

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

arxiv: v1 [cs.cl] 12 Sep 2018

Study on the location of building evacuation indicators based on eye tracking

Fast Intra-Prediction Mode Decision in H.264/AVC Based on Macroblock Properties

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

Product Information. Miniature rotary unit ERD

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

Sealed Circular LC Connector System Plug

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

Product Information. Universal swivel units SRU-plus

Environmental Reviews. Cause-effect analysis for sustainable development policy

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Academic Standards and Calendar Committee Report # : Proposed Academic Calendars , and

User guide. Receiver-In-The-Ear hearing aids, rechargeable Hearing aid charger. resound.com

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

Loewe bild 5.55 oled. Modular Design Flexible configuration with individual components. Set-up options. TV Monitor

Modular Plug Connectors (Standard and Small Conductor)

User guide. Receiver-In-Ear hearing aids. resound.com

www. ElectricalPartManuals. com l Basler Electric VOLTAGE REGULATOR FEATURES: CLASS 300 EQUIPMENT AVC63 4 FEATURES AND APPLICATIONS

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

User guide. Receiver-In-The-Ear hearing aids, rechargeable Hearing aid charger. resound.com

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation. Get the total Coverage!

GENERAL AGREEMENT ON MMra

www. ElectricalPartManuals. com l Basler Electric P. 0. BOX 269 HIGHLAND, ILLINOIS 62249, U.S.A. PHONE FAX

Printer Specifications

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

Discussion Paper Series

Expressive Musical Timing

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

Management of Partially Safe Buffers

Product Information. Universal swivel units SRU-plus 25

(12) Ulllted States Patent (10) Patent N0.: US 8,269,970 B2 P0lid0r et a]. (45) Date of Patent: Sep. 18, 2012

IN DESCRIBING the tape transport of

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

Patrolling Mechanisms for Disconnected Targets in Wireless Mobile Data Mules Networks

zenith Installation and Operating Guide HodelNumber I Z42PQ20 [ PLASHATV

High Speed Optical Networking: Task 3 FEC Coding, Channel Models, and Evaluations

User Manual ANALOG/DIGITAL, POSTIONER RECEIVER WITH EMBEDDED VIACCESS AND COMMON INTERFACE

THE SIMULATION OF TRANSPORT DELAY WITH THE HYDAC* COMPUTING SYSTEM

Social Interactions and Stigmatized Behavior: Donating Blood Plasma in Rural China

Transcription:

Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Quantzaton of Three-Bt Logc for LDPC Decodng Raymond Moberly and Mchael E. O'Sullvan Abstract Ths paper presents two related three-bt quantzatons for sum-product algorthm LDPC decodng that are sutable for programmable logc. The key aspect of our decoder desgn s the combnng of the party-check and varable node update steps nto a sngle computaton. The performance and the hardware requrements for an FPGA mplementaton are consdered and compared to the work of Planjery et al. I. INTRODUCTION Low Densty Party Check (LDPC) codes are well suted for error-correcton applcatons. However, the challenge s to nd strateges that wll enable efcent mplementatons whle ensurng good performance. Iteratve decoder desgns usng a small number of quantzaton bts appear n the works of T. Zhang and Parh[1], and Planjery et al[2], and Z. Zhang et al[3]. Each team has devsed a desgn sutable for dgtal logc mplementaton. In ths paper we present quantzatons for a sum-product algorthm LDPC decoder usng the recever samplng resoluton avalable on a Gaussan channel. We examne decoder performance of varous three-bt quantzatons, ndng that the best choce of quantzaton changes as the channel condtons change. Our desgn combnes the party check and varablenode update steps nto a sngle computaton. Ths paper presents synthess results showng the latency and footprnt of the key computatonal component of our decoder desgn. Our experments are wth a rate- 1 2 length 1162 bnary LDPC code; t s from a famly of codes that our research group has generated usng permutaton matrces[4][5]. Ths methodology permts the constructon of codes of large grth. The cyclc permutaton structure s known to have efcent hardware mplementatons[6][7]. II. SCOPE The Sum Product Algorthm (SPA) was smulated on a computer cluster, usng look-up tables based upon threebt quantzaton, for 10 teratons. Our quantzaton, wth 10 teratons, surpasses the performance of Planjery et al wth 100 teratons. We determne the per-teraton computatonal latency and evaluate trade-offs between teratons and computaton per-teraton, whch contrbute to total latency and gan. Manuscrpt receved July 21, 2011; revsed August 16, 2011. Ths research was supported n part by NSF grants CCF 0635382 and CHE 0216563. FPGA hardware and development tools were provded by the Altera Corporaton. R. Moberly s wth the Computatonal Scence Program, San Dego State Unversty, San Dego, CA 92182, USA emal: moberly@scences.sdsu.edu M. E. O'Sullvan s wth the Mathematcs Department, San Dego State Unversty, San Dego, CA 92182, USA emal: mosullv@math.sdsu.edu We select these as the comparson crtera n our concluson and we dscuss other potental crtera; n an engneerng applcaton, decoder desgn could be optmzed for throughput or power consumpton. A. FPGA Implementaton The Feld Programmable Gate Array (FPGA) offers a very rapd pathway to concept development; t s also well-suted to computaton wth non-standard precson and varable data types that are not avalable n mcroprocessors. The Applcaton Specc Integrated Crcut (ASIC) also offers customzed precson, but there s a hgh development cost. In contrast to ASIC development, FPGA development s low-cost, easly debugged, and correctable. When mplementng the sumproduct algorthm n an FPGA, the desgner has a choce of precson and quantzaton; precson can be ncreased at the cost of computatonal speed. Sze, power, and latency are mportant engneerng factors n communcaton systems. Reducng precson reduces the codng gan but accelerates the computaton. An FPGA soluton [1] n the lterature acheved LDPC decodng usng operands wth just 5-bts. Our own pror research [8] explored tradeoffs between the number of bts of precson and the number of decodng teratons. Synthess results, such as those presented n our present paper, help to explore the capablty and performance of an FPGAbased decoder. The LDPC decoder for a regular code has a very repettve structure, performng dentcal operatons on each bt of the receved code word. Our analyss, mplementaton, and synthess presents the computaton for a sngle code symbol. The length 1162 LDPC code that we tested our decoder wth s a rate- 1 2 (6,3)-regular code. Each varable node outputs three updated messages; we mplemented the logc of just one of these output messages n order to determne the latency, and then mplemented all three outputs to observe the consequent speed and sze. Logc synthess can seek to maxmze speed, or mnmze chp area, or optmze some combned weghted functon of speed and chp area. The Altera DE2 development board was selected for ths work and requested from and provded by the Altera Corporaton as a unversty research grant. The FPGA on the DE2 board s the Cyclone II EP2C35F672C6N, t has a substantal number of programmable logc elements (33,216). B. Formulatons of the Iteratve Algorthm We looked at the SPA as a cycle n our ISIT 2006 paper[9]. Fgure 1 shows the teratve algorthm formulatons

FER or BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Levne MacKay Jmenez 0, 1 δ ρ λ 0, 1 δ ρ λ Fg. 1. Iteratve SPA Formulatons n the Lterature Fg. 2. Moberly / O Sullvan 0, 1 δ ρ λ 0, 1 δ ρ λ Our publshed Formulaton of the Sum-Product Algorthm AWGN 100 ters FER AWGN 100 ters BER BSC 100 ters FER BSC 100 ters BER Fg. 3. FER and BER for AWGN and BSC Channels cyclng through probablty representatons, where the varable and party check messages can be expressed n terms of probabltes, dfferences p = P (0) P (1), ratos p = P (0) P (1), or log-lkelhood ratos p = logp. We compared varous formulatons of the SPA[10][11][12][13] whch were mathematcally equvalent but computatonally dfferent. One of the conclusons of that paper - formulatons whch represent probabltes as dfferences (p) or as log-lkelhood ratos (LLR) offered sgncant computatonal advantages. These resulted n fewer CPU nstructons. Transformng multplcaton operatons nto addton operatons n the log doman ncreases performance on computer processors wth arthmetc logc unts that can perform addton more rapdly than multplcaton[14][15][16][17][18]. The advantage s dentely sgncant when workng wth 32-bt and 64-bt varables; but what f there are only a few bts of precson n use? For lmted precson, the dfference between O(n bts) addton versus O(n 2 bts) multplcaton mght not be sgncant. As Han and Sunwoo showed[19], the LLR calculatons nvolve one partcularly obstructve computaton, an nverse hyperbolc tangent functon; ther lmted precson computaton nvolves a table for ths calculaton. Zhang et al have also looked at xed-pont LLR quantzatons usng 5, 6 and 7 bts[3]; n these mplementatons, the hyperbolc tangent functon s a substantal part of the desgn effort and computatonal work. The cycle for the formulaton we ntroduced s shown n Fgure 2. In ths paper, nstead of lookng at the party check and varable-node update as two separate actons, we wll present the cycle as a sngle computaton wth one quantzaton appled per teraton. C. Comparng BSC and AWGN The Addtve Whte Gaussan Nose (AWGN) channel and the Bnary Symmetrc Channel (BSC) both appear n smulaton efforts as representatves of real-world channel condtons. Ths paper compares decodng results on a Gaussan channel wth competng publshed results that use the BSC. The equvalence computaton s = 1 2 erfc(p 2 E b N 0 ), where s the BSC bt crossover probablty, and E b N 0 s the sgnal to nose rato () that characterzes a Gaussan channel. For decoders wth oatng-pont belef propagaton, there s an almost 2 db dfference n performance. Truncaton to a hard decson at the recever results n the 2 db loss that dfferentates the BSC and AWGN channels, as shown n gure 3. The dfference s about the same whether the decoder s evaluated based upon bt error rate (BER) or frame error rate (FER). Consderng ths loss, t seems a natural move to collect soft decsons at the recever f the decoder s gong to work wth soft-nformaton nternally. Our decoder desgn assumes a soft-decson recever wth three bts of precson and our speced quantzatons. III. PLANJERY'S BEYOND BELIEF PROPAGATION We replcated the quantzed three-bt algorthm speced n Planjery's paper[2]. We reproduced the 100 teraton results from ther paper usng several publshed codes (e.g. benchmarks) and ran smulatons for our own code wth both 10 and 100 teratons for a range of values. These are shown n gure 4 (BER) and gure 5 (FER). Each graph shows the applcable reference curves from gure 3. Planjery also produced, usng a specalzed three-bt propretary quantzaton and algorthm, mproved results through an approach desgned to overcome the nuence of trappng sets. Wth Shva Planjery's gracous cooperaton we were able to obtan the resultng performance curve of ther propretary decoder appled to the LDPC code that came from our own permutaton constructons. Transformed from crossover probablty to an axs, ths curve s shown n gures 4 (BER) and 5 (FER). The quantzed algorthm of Planjery et al compares favorably to a oatng-pont belef-propagaton decoder operatng upon hard decson samples from the recever. These

FER BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Planjery as Publshed 10 ters Planjery as Publshed 100 ters Planjery Propretary 100 ters 1 bt hard decson 3 bt quantzed 3 bt quantzed 3 bt quantzed A2Dg A2Dh A2D A2Dj A2Dk g h j k g φ h (0 ) 1 bt hard decson j party check k whole teraton varable node update Fg. 6. Quantzaton of the Varable Nodes and the Party Computaton φ Fg. 4. BER for Publshed and Propretary decoders of Planjery et al 3 bt soft decson 3 bt quantzed not explctly quantzed 3 bt quantzed A2Dg g g φ φ A2Dh A2D A2Dj A2Dk h j k h (0 ) 3 bt soft decson j party check k whole teraton varable node update Fg. 5. Planjery as Publshed 10 ters Planjery as Publshed 100 ters Planjery Propretary 100 ters FER for Publshed and Propretary decoders of Planjery et al propretary performance curves are repeated n the charts for our quantzatons, gures 8 and 9, for comparson. The Planjery et al three-bt algorthm begns wth a sngle bt quantzaton (a hard decson) at the recever. It performs another quantzaton at each party check, and then quantzes agan at each varable node update. Three-bt messages are used for the party check operaton nputs and outputs. Other algorthms n the lterature quantze n a smlar fashon, two quantzatons per teraton, as llustrated n Fgure 6. A. Synthess of the Planjery Vasc 3-bt Decoder We mplemented the three-bt logc of ther party checks and varable node update n Verlog HDL. The synthess results, targetng our Cyclone II FPGA, were reported by the Altera Quartus II software. The sngle bt computaton used 138 logc elements and had a longest path delay of 20.489 nanoseconds. If we were to compute 1162 bts (the length of ths LDPC code) smultaneously, the footprnt would expand to 160356 logc elements. If we were to compute, sequentally, the 100 teratons used n Planjery and Vasc's smulatons, the decodng latency would Fg. 7. Quantzaton of the Varable Nodes. One Quantzaton per Iteraton be multpled to 2.0489 mcroseconds. Ths synthess result gves a baselne for the mplementaton cost of ther publshed algorthm. Ther second stage propretary rule, gvng them sgncant addtonal codng gan, ncreases the mplementaton cost by an amount unknown to us. The quantzatons that we propose n the followng sectons requre more logc elements, but our performance results show the benets of those addtonal mplementaton costs. IV. OUR WORK: ONE COMPUTATION PER ITERATION The SPA s typcally descrbed as two computatonal steps. If we consder the teraton to be a combned-step nstead of the two separate steps, the formulaton stll has mathematcal equvalence but the computaton changes. Instead of applyng quantzaton twce n an teraton, one quantzaton s appled. The ntermedate quantzaton s not speced, but quantzaton s mpled; that mpled quantzaton s descrbed later n the synthess results subsecton. Fgure 7 llustrates the wholeteraton computaton that we worked wth. A. Quantzaton Scales Our quantzaton values are expressed n representaton, whch transforms [0,1] probablty values to the range of [-1,+1]. Fve-bt quantzatons proved to be very effectve

FER BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA n LDPC decodng n our prevous effort. A quantzaton 1 scheme usng the sgmod functon, S(x) = 1+e, was x among those that we used to determne the dscrete scale values[8]. In ths paper we present two related three-bt quantzatons, based upon sgmod functon evaluatons at certan ntervals: x = 1:5 f1; 2; 3; 4g = f1:5; 3:0; 4:5; 6:0g and x = 2:0 f1; 2; 3; 4g = f2:0; 4:0; 6:0; 8:0g. These show partcular promse for decoder quantzaton over a tested range of Gaussan channel values. The step thresholds, T, that we chose are the means between the step heghts. The step-functon mappng of p assgns the quantzed value s, choosng such that t 1 p t. The two tested quantzaton scales are: Table 1. Sgmod "635" Scale Quantzaton Step "S" () Values s 4 s 3 s 2 s 1 s 1 s 2 s 3 s 4-0.995-0.98-0.90-0.64 0.64 0.90 0.98 0.995 and Step Threshold "T" Values t 3 t 2 t 1 0 t 1 t 2 t 3-0.99-0.95-0.77 0.0 0.77 0.95 0.99 Table 2. Sgmod "762" Scale Quantzaton Step "S" () Values s 4 s 3 s 2 s 1 s 1 s 2 s 3 s 4-0.999-0.995-0.96-0.76 0.76 0.96 0.995 0.999 Step Threshold "T" Values t 3 t 2 t 1 0 t 1 t 2 t 3-0.99-0.98-0.86 0.0 0.86 0.98 0.99 Notce how, for both scales, the precson s concentrated n the regons of greatest certanty; the step functons have nely spaced steps at the two extremes. These famles of quantzatons suggest an mplementaton strategy for varyng the decoder precson; such a strategy could compete wth other adaptve error correcton technologes that have been developed (rate compatble codes, etc.). The two quantzatons tested dffer only n how the x values of the sgmod S(x) are selected. B. Decoder Performance We found that one of our quantzaton scales was better for lower condtons and the other was better for hgh condtons. A decoder ntended to work well for a wde range of condtons mght be desgned to adapt ts quantzaton as the channel condtons change. As channel condtons change, the current nose level could be estmated from the sample varance; we haven't yet bult the logc needed to do ths, but we understand t to be a common practce n sgnal processng. The SPA smulaton results are shown n gures 8 (BER) and 9 (FER). The graphs show comparable results from a Fg. 8. Sgmod "635" 10 ters Sgmod "762" 10 ters Planjery Propretary 100 ters Fg. 9. BER for Sgmod "635" and "762", compared wth Planjery et al Sgmod "635" 10 ters Sgmod "762" 10 ters Planjery Propretary 100 ters FER for Sgmod "635" and "762", compared wth Planjery et al smulaton by Planjery, usng ther propretary three-bt decoder upon our own length 1162-bt LDPC code. The small vertcal bars on the graph data ponts show the upper end of a 95% condence nterval for each of our smulaton result values. These condence ntervals can be reduced wth longer smulatons (more samples). The condence ntervals that we present are small enough to rmly assert the followng clams: The "635" quantzaton outperforms the "762" quantzaton over the [1.0,3.5] range. The "762" quantzaton outperforms the "635" quantzaton over the [4.0,5.0] range. At the 10 4 BER level, usng our chosen rate- 1 2 LDPC code, both of our decoder quantzatons outperform the Planjery and Vasc propretary algorthm. The best BER gan s about 0.9 db better than ther approach. FER gans, somewhat less substantal, are also seen over most of the tested regon. A decoder adaptng between our two quantzatons outperforms ther approach over the entre tested range.

Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA C. Synthess Results In our quantzaton approach, as descrbed above, lmted precson s appled to the recever samplng and to the varable node updates. Usng ths, we mplemented a combned party check and varable node update calculaton usng a mxture of calculatons, logc, and a table lookup. The three-bt (6,3) party check results n one of 112 possble output values, far less than the 2 (35) nput combnatons. Another way to express ths s as an mpled quantzaton - the party check output can be dgtally represented usng seven bts, snce 112 < 2 7. The table lookup determnes an update by specfyng 1121128 = 100352 three-bt values. There are addtonal symmetres whch make t unnecessary to store ths many computed table values. Our technque for ndng the smplcatons was to allow the Altera Quartus II synthess tool to do the smplfyng for us. For our tested quantzatons, the tool consstently dgested the table lookup (speced n Verlog HDL) and produced a result wth a complexty reduced by a factor of about 1000. The cost for each was an overnght, (8 1 2 ) hour, synthess, place, and routng run. The synthess returns the number of logc elements (LE), whch are requred for the desgn and t computes, after placng and routng n an optmal manner, the longest path delay (LPD) between any par among the nputs and outputs. The nverse of the LPD s the hghest approprate clock frequency for a clock-synchronous desgn. The logc for calculatng one varable update usng two assocated party checks, syntheszed to less than 5,000 logc elements. When the expressed desgn was expanded to nclude all three assocated party checks and compute all three of the resultng varable node updates, the desgn footprnt more than doubled, but t dd not trple. The delay ncreased by less than 20%. The three-message logc syntheszed to a blend of shared computaton and parallelsm. Table 3. Synthess Results for each Quantzaton msgs LEs LPD max clk (ns) (MHz) Planjery's algorthm 3 138 20.489 48.8 Sgmod "635" Scale 1 4,743 36.255 27.5 3 11,111 43.099 23.2 Sgmod "762" Scale 1 4,471 37.518 26.6 3 10,070 42.485 23.5 The chosen Cyclone II FPGA s too small to handle the 1162 replcatons of ths desgn needed to process all of the bts of a code word smultaneously. A table lookup mplementaton s a good canddate for ppelnng so a fast full-codeword desgn s entrely feasble. Our syntheszed desgn has twce the per-teraton latency of Planjery's publshed desgn (per our synthess results). Ths computed factor of only two may be an overoptmstc comparson because some of both delays may be due to the overhead of drectng nput to and recevng output from the FPGA chp tself. To obtan a farer comparson usng these sngle teraton synthess gures, we would omt some nput/output porton of the latency from the per teraton measure. We determned an upper bound for ths contrbuton by mplementng a very mnmalst crcut, just an XOR of all of the nputs that also drves all of our outputs. That crcut, wth three-bt nputs consumed 39 logc elements and had a latency of 17.560 ns. If we subtract off ths latency tme value from both longest path gures, then the Planjery/Vasc adds 2.929 ns to ths mnmal latency (to get the 20.489 ns total) and the "0.762" Sgmod adds 24.925 to ths mnmal latency. The rato of these two tme duratons s approxmately eght to one. Snce our decoder exceeds, n 10 teratons, the decodng gan of Planjery's propretary decoder wth 100 teratons, we compute the total decodng tme for one bt to be 1024:925 = 249:3 ns for our desgn and 1002:929 = 292:9 ns for Planjery's publshed desgn. The tmng advantage of our Sgmod decoder s 15%. The logc crcuty of our decoder, wth ts quantzatons, was larger than the logc to mplement ther decoder, but our decodng operaton was faster and obtans better decodng results for the tested regons of, BER and FER. Our computaton for one code symbol ts wthn the selected FPGA; we could readly use ths to decode a full codeword n a seral fashon. Alternatvely, we could ncrease throughput by usng a larger chp or by redesgnng for an ASIC. Usng a larger chp would gve us greater throughput and parallelzaton opportuntes; these can be explored more thoroughly under the engneerng constrants of a specc applcaton. D. Further Work Wth longer smulatons we may determne how far down these performance curves go; explore more thoroughly the possble error oors of our approach and determne whch of the approaches pushes down the error oor more. We have an alternatve to longer and expensve smulatons va our ongong work n the mportance samplng technques that can be used to approxmate smulatons of very low error rate condtons. We prevously studed the effect of varyng the number of decodng teratons wth ths partcular permutaton-based LDPC code; we found that a decodng by 10 teratons was usually conclusve [8]. Smulatons of our new quantzed desgn n ths paper wth 100 teratons (nstead of just 10) resulted n only mnor addtonal gans ( 1 4 db n terms of BER and 1 3 db gan per the FER curves). It bears mentonng that our smulaton has the exblty to use dfferent quantzatons at each teraton. We have expermented wth ths capablty but we are wthout conclusve results. V. COMPARING DECODERS Our results, usng three-bt samples from a Gaussan channel, have 0.5 to 0.9 db better gan than the hard-decson recever approach used by Planjery et al[2]. A concluson from ths s that a recever that can sample ncomng symbols wth three bts s better than one that makes a hard-decson.

Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA The delty avalable at the recever samplng pont should not be dscarded. The quantzaton selected for three-bts of precson does make a dfference and consderng the channel condtons s mportant when tryng to choose the best possble quantzaton. Because we found that one of our quantzatons was better n the lower range and the other was better n the hgher range, we proposed a decoder that adapts between our two quantzatons accordng to a frequent estmaton of the channel condtons The 33,216 LE capacty of our FPGA could accommodate the logc of both of our quantzatons, leavng enough addtonal room for the logc to measure the channel and select the quantzaton adaptvely. The adaptve decoder can beat Planjery's decoder by approxmately 0.9 db over a substantal BER range (10 2 to 10 7 ). Although the sngle teraton latency s greater than that of the Planjery et al desgn, our success wth 10 teratons means that a decoder soluton that s better for a range of condtons can be reached n less tme. We beleve there s a potental for parallelzaton and ppelnng, but even workng through the bts one at a tme n a seral fashon, the 430 ns per bt processng would support a decodng throughput over 2 Mbps. Ths FPGA-based capablty s adequate to fulll the dverse narrowband requrements and acheve the lower threshold for wdeband operaton of a contemporary rado system[20]. Our synthess assessment s of Planjery's publshed desgn. We make two assumptons n order to compare our decoder to ther propretary desgn: (1) that the propretary enhancements ncrease latency beyond that of the publshed desgn and (2) that the propretary desgn requres addtonal logc. The comparson favors our decoder on two of three evaluaton crtera. The comparson s summarzed n the followng table. Table 4. Implementaton Comparson Our Planjery's desgns desgn pub. prop. Decode 1 Bt (ns) 249.3 292.9 Gan @ 10 4 BER (db) +8.5 +6.5 +7.6 Chp Area (LEs) 21,181 138 REFERENCES [1] T. Zhang, K.K. Parh, A 54 Mbps (3,6)-regular FPGA LDPC decoder. IEEE Workshop on Sgnal Processng Systems 2002 (SIPS '02), pages 127-132, Oct 2002 [2] S.K. Planjery, S.K. Chlappagar, B. Vasc, D. Declercq, L. Danjean, Iteratve Decodng Beyond Belef Propagaton. Informaton Theory and Applcatons Workshp (ITA), pages 1 10, 2010 [3] Z. Zhang, L. Dolecek, B. Nkolc, V. Anantharam, M. Wanwrght, Desgn of LDPC decoders for mproved low error rate performance: quantzaton and algorthm choces. Communcatons, IEEE Transactons on Volume: 57, Issue: 11, pages 3258 3268, 2009 [4] M.E. O'Sullvan, R. Smarandache, Hgh-rate, short length, (3,3s)- regular LDPC of grth 6 and 8. Informaton Theory, Proceedngs, IEEE Internatonal Symposum on, page 59, 2003 [5] M. Greferath, M.E. O'Sullvan, R. Smarandache, Constructon of good LDPC codes usng dlaton matrces. Informaton Theory, Proceedngs, Internatonal Symposum on, page 235, 2004 [6] Y. Chen, K.K. Parh, Overlapped Message Passng for Quas-Cyclc Low-Densty Party Check Codes. IEEE Transactons on Crcuts and Systems, v. 51 no. 6, 2004 [7] M.M. Mansour, N.R. Shanbhag, Low-power VLSI decoder archtectures for LDPC codes. Proceedngs of the 2002 nternatonal symposum on Low power electroncs and desgn (ISLPED), pages 284 289, August 2002 [8] R. Moberly, M. O'Sullvan, Representng probabltes wth lmted precson for teratve soft-decson LDPC decodng. 2006 Wreless Personal Multmeda Conference, September 2006 [9] R. Moberly, M. O'Sullvan, Computatonal performance of varous formulatons of the teratve soft-decson decoder algorthm. 2000 IEEE Internatonal Symposum on Informaton Theory, pages 1703 1707, July 2006 [10] J.Pearl. Probablstc Reasonng n Intellgent Systems - Networks of Plausble Inference. Morgan Kaufmann, 1988 [11] B. Levne, R.R. Taylor, H. Schmt, Implementaton of near Shannon lmt error-correctng codes usng recongurable hardware. 2000 IEEE Symposum on Feld-Programmable Custom Computng Machnes, pages 217-226, Aprl 2000 [12] D. Davey, M.C. MacKay, Low-densty party check codes over GF(q). IEEE Communcatons Letters, 2:165 167, June 1998 [13] A. Jmenez, K.Sh. Zgangrov, Perodc tme-varyng convolutonal codes wth low-densty party-check matrces. Proceedngs 1998 IEEE Internatonal Symposum on Informaton Theory, page 305, Aug 1998 [14] M. Gokhale and P. Graham. Recongurable Computng : Acceleratng Computaton wth Feld-Programmable Gate Arrays, Chapters 1-4, Sprnger, Dordrecht, 2005 [15] M. Flynn, S.F. Oberman, Advanced Computer Arthmetc Desgn, Chapter 2, John Wley and Sons Inc., New York, 2001 [16] B. Parham, Computer Arthmetc - Algorthms and Hardware Desgns, Chapters 1, 3, and 18, Oxford Unversty Press, New York, 2000 [17] I. Koren, Computer Arthmetc Algorthms, Chapter 6, A.K. Peters Ltd., Natck, 2002 [18] M. Ercegovac, T. Lang, Dgtal Arthmetc, Chapter 8, Morgan Kaufmann, San Francsco, 2004 [19] J.H. Han, M.H. Sunwoo, Smpled sum-product algorthm usng pecewse lnear functon approxmaton for low complexty LDPC decodng. ICUIMC '09: Proceedngs of the 3rd Internatonal Conference on Ubqutous Informaton Management and Communcaton, pages 302 309, February 2009 [20] U.S. Department of Defense Jont Requrements Oversght Councl Jont Tactcal Rado System (JTRS) Operatonal Requrements Document (ORD). Aprl 2003, avalable at http://www.fas.org/man/dod- 101/sys/land/docs/jtr23_mar.htm