1 Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Quantzaton of Three-Bt Logc for LDPC Decodng Raymond Moberly and Mchael E. O'Sullvan Abstract Ths paper presents two related three-bt quantzatons for sum-product algorthm LDPC decodng that are sutable for programmable logc. The key aspect of our decoder desgn s the combnng of the party-check and varable node update steps nto a sngle computaton. The performance and the hardware requrements for an FPGA mplementaton are consdered and compared to the work of Planjery et al. I. INTRODUCTION Low Densty Party Check (LDPC) codes are well suted for error-correcton applcatons. However, the challenge s to nd strateges that wll enable efcent mplementatons whle ensurng good performance. Iteratve decoder desgns usng a small number of quantzaton bts appear n the works of T. Zhang and Parh, and Planjery et al, and Z. Zhang et al. Each team has devsed a desgn sutable for dgtal logc mplementaton. In ths paper we present quantzatons for a sum-product algorthm LDPC decoder usng the recever samplng resoluton avalable on a Gaussan channel. We examne decoder performance of varous three-bt quantzatons, ndng that the best choce of quantzaton changes as the channel condtons change. Our desgn combnes the party check and varablenode update steps nto a sngle computaton. Ths paper presents synthess results showng the latency and footprnt of the key computatonal component of our decoder desgn. Our experments are wth a rate- 1 2 length 1162 bnary LDPC code; t s from a famly of codes that our research group has generated usng permutaton matrces. Ths methodology permts the constructon of codes of large grth. The cyclc permutaton structure s known to have efcent hardware mplementatons. II. SCOPE The Sum Product Algorthm (SPA) was smulated on a computer cluster, usng look-up tables based upon threebt quantzaton, for 10 teratons. Our quantzaton, wth 10 teratons, surpasses the performance of Planjery et al wth 100 teratons. We determne the per-teraton computatonal latency and evaluate trade-offs between teratons and computaton per-teraton, whch contrbute to total latency and gan. Manuscrpt receved July 21, 2011; revsed August 16, Ths research was supported n part by NSF grants CCF and CHE FPGA hardware and development tools were provded by the Altera Corporaton. R. Moberly s wth the Computatonal Scence Program, San Dego State Unversty, San Dego, CA 92182, USA emal: M. E. O'Sullvan s wth the Mathematcs Department, San Dego State Unversty, San Dego, CA 92182, USA emal: We select these as the comparson crtera n our concluson and we dscuss other potental crtera; n an engneerng applcaton, decoder desgn could be optmzed for throughput or power consumpton. A. FPGA Implementaton The Feld Programmable Gate Array (FPGA) offers a very rapd pathway to concept development; t s also well-suted to computaton wth non-standard precson and varable data types that are not avalable n mcroprocessors. The Applcaton Specc Integrated Crcut (ASIC) also offers customzed precson, but there s a hgh development cost. In contrast to ASIC development, FPGA development s low-cost, easly debugged, and correctable. When mplementng the sumproduct algorthm n an FPGA, the desgner has a choce of precson and quantzaton; precson can be ncreased at the cost of computatonal speed. Sze, power, and latency are mportant engneerng factors n communcaton systems. Reducng precson reduces the codng gan but accelerates the computaton. An FPGA soluton  n the lterature acheved LDPC decodng usng operands wth just 5-bts. Our own pror research  explored tradeoffs between the number of bts of precson and the number of decodng teratons. Synthess results, such as those presented n our present paper, help to explore the capablty and performance of an FPGAbased decoder. The LDPC decoder for a regular code has a very repettve structure, performng dentcal operatons on each bt of the receved code word. Our analyss, mplementaton, and synthess presents the computaton for a sngle code symbol. The length 1162 LDPC code that we tested our decoder wth s a rate- 1 2 (6,3)-regular code. Each varable node outputs three updated messages; we mplemented the logc of just one of these output messages n order to determne the latency, and then mplemented all three outputs to observe the consequent speed and sze. Logc synthess can seek to maxmze speed, or mnmze chp area, or optmze some combned weghted functon of speed and chp area. The Altera DE2 development board was selected for ths work and requested from and provded by the Altera Corporaton as a unversty research grant. The FPGA on the DE2 board s the Cyclone II EP2C35F672C6N, t has a substantal number of programmable logc elements (33,216). B. Formulatons of the Iteratve Algorthm We looked at the SPA as a cycle n our ISIT 2006 paper. Fgure 1 shows the teratve algorthm formulatons
2 FER or BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Levne MacKay Jmenez 0, 1 δ ρ λ 0, 1 δ ρ λ Fg. 1. Iteratve SPA Formulatons n the Lterature Fg. 2. Moberly / O Sullvan 0, 1 δ ρ λ 0, 1 δ ρ λ Our publshed Formulaton of the Sum-Product Algorthm AWGN 100 ters FER AWGN 100 ters BER BSC 100 ters FER BSC 100 ters BER Fg. 3. FER and BER for AWGN and BSC Channels cyclng through probablty representatons, where the varable and party check messages can be expressed n terms of probabltes, dfferences p = P (0) P (1), ratos p = P (0) P (1), or log-lkelhood ratos p = logp. We compared varous formulatons of the SPA whch were mathematcally equvalent but computatonally dfferent. One of the conclusons of that paper - formulatons whch represent probabltes as dfferences (p) or as log-lkelhood ratos (LLR) offered sgncant computatonal advantages. These resulted n fewer CPU nstructons. Transformng multplcaton operatons nto addton operatons n the log doman ncreases performance on computer processors wth arthmetc logc unts that can perform addton more rapdly than multplcaton. The advantage s dentely sgncant when workng wth 32-bt and 64-bt varables; but what f there are only a few bts of precson n use? For lmted precson, the dfference between O(n bts) addton versus O(n 2 bts) multplcaton mght not be sgncant. As Han and Sunwoo showed, the LLR calculatons nvolve one partcularly obstructve computaton, an nverse hyperbolc tangent functon; ther lmted precson computaton nvolves a table for ths calculaton. Zhang et al have also looked at xed-pont LLR quantzatons usng 5, 6 and 7 bts; n these mplementatons, the hyperbolc tangent functon s a substantal part of the desgn effort and computatonal work. The cycle for the formulaton we ntroduced s shown n Fgure 2. In ths paper, nstead of lookng at the party check and varable-node update as two separate actons, we wll present the cycle as a sngle computaton wth one quantzaton appled per teraton. C. Comparng BSC and AWGN The Addtve Whte Gaussan Nose (AWGN) channel and the Bnary Symmetrc Channel (BSC) both appear n smulaton efforts as representatves of real-world channel condtons. Ths paper compares decodng results on a Gaussan channel wth competng publshed results that use the BSC. The equvalence computaton s = 1 2 erfc(p 2 E b N 0 ), where s the BSC bt crossover probablty, and E b N 0 s the sgnal to nose rato () that characterzes a Gaussan channel. For decoders wth oatng-pont belef propagaton, there s an almost 2 db dfference n performance. Truncaton to a hard decson at the recever results n the 2 db loss that dfferentates the BSC and AWGN channels, as shown n gure 3. The dfference s about the same whether the decoder s evaluated based upon bt error rate (BER) or frame error rate (FER). Consderng ths loss, t seems a natural move to collect soft decsons at the recever f the decoder s gong to work wth soft-nformaton nternally. Our decoder desgn assumes a soft-decson recever wth three bts of precson and our speced quantzatons. III. PLANJERY'S BEYOND BELIEF PROPAGATION We replcated the quantzed three-bt algorthm speced n Planjery's paper. We reproduced the 100 teraton results from ther paper usng several publshed codes (e.g. benchmarks) and ran smulatons for our own code wth both 10 and 100 teratons for a range of values. These are shown n gure 4 (BER) and gure 5 (FER). Each graph shows the applcable reference curves from gure 3. Planjery also produced, usng a specalzed three-bt propretary quantzaton and algorthm, mproved results through an approach desgned to overcome the nuence of trappng sets. Wth Shva Planjery's gracous cooperaton we were able to obtan the resultng performance curve of ther propretary decoder appled to the LDPC code that came from our own permutaton constructons. Transformed from crossover probablty to an axs, ths curve s shown n gures 4 (BER) and 5 (FER). The quantzed algorthm of Planjery et al compares favorably to a oatng-pont belef-propagaton decoder operatng upon hard decson samples from the recever. These
3 FER BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA Planjery as Publshed 10 ters Planjery as Publshed 100 ters Planjery Propretary 100 ters 1 bt hard decson 3 bt quantzed 3 bt quantzed 3 bt quantzed A2Dg A2Dh A2D A2Dj A2Dk g h j k g φ h (0 ) 1 bt hard decson j party check k whole teraton varable node update Fg. 6. Quantzaton of the Varable Nodes and the Party Computaton φ Fg. 4. BER for Publshed and Propretary decoders of Planjery et al 3 bt soft decson 3 bt quantzed not explctly quantzed 3 bt quantzed A2Dg g g φ φ A2Dh A2D A2Dj A2Dk h j k h (0 ) 3 bt soft decson j party check k whole teraton varable node update Fg. 5. Planjery as Publshed 10 ters Planjery as Publshed 100 ters Planjery Propretary 100 ters FER for Publshed and Propretary decoders of Planjery et al propretary performance curves are repeated n the charts for our quantzatons, gures 8 and 9, for comparson. The Planjery et al three-bt algorthm begns wth a sngle bt quantzaton (a hard decson) at the recever. It performs another quantzaton at each party check, and then quantzes agan at each varable node update. Three-bt messages are used for the party check operaton nputs and outputs. Other algorthms n the lterature quantze n a smlar fashon, two quantzatons per teraton, as llustrated n Fgure 6. A. Synthess of the Planjery Vasc 3-bt Decoder We mplemented the three-bt logc of ther party checks and varable node update n Verlog HDL. The synthess results, targetng our Cyclone II FPGA, were reported by the Altera Quartus II software. The sngle bt computaton used 138 logc elements and had a longest path delay of nanoseconds. If we were to compute 1162 bts (the length of ths LDPC code) smultaneously, the footprnt would expand to logc elements. If we were to compute, sequentally, the 100 teratons used n Planjery and Vasc's smulatons, the decodng latency would Fg. 7. Quantzaton of the Varable Nodes. One Quantzaton per Iteraton be multpled to mcroseconds. Ths synthess result gves a baselne for the mplementaton cost of ther publshed algorthm. Ther second stage propretary rule, gvng them sgncant addtonal codng gan, ncreases the mplementaton cost by an amount unknown to us. The quantzatons that we propose n the followng sectons requre more logc elements, but our performance results show the benets of those addtonal mplementaton costs. IV. OUR WORK: ONE COMPUTATION PER ITERATION The SPA s typcally descrbed as two computatonal steps. If we consder the teraton to be a combned-step nstead of the two separate steps, the formulaton stll has mathematcal equvalence but the computaton changes. Instead of applyng quantzaton twce n an teraton, one quantzaton s appled. The ntermedate quantzaton s not speced, but quantzaton s mpled; that mpled quantzaton s descrbed later n the synthess results subsecton. Fgure 7 llustrates the wholeteraton computaton that we worked wth. A. Quantzaton Scales Our quantzaton values are expressed n representaton, whch transforms [0,1] probablty values to the range of [-1,+1]. Fve-bt quantzatons proved to be very effectve
4 FER BER Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA n LDPC decodng n our prevous effort. A quantzaton 1 scheme usng the sgmod functon, S(x) = 1+e, was x among those that we used to determne the dscrete scale values. In ths paper we present two related three-bt quantzatons, based upon sgmod functon evaluatons at certan ntervals: x = 1:5 f1; 2; 3; 4g = f1:5; 3:0; 4:5; 6:0g and x = 2:0 f1; 2; 3; 4g = f2:0; 4:0; 6:0; 8:0g. These show partcular promse for decoder quantzaton over a tested range of Gaussan channel values. The step thresholds, T, that we chose are the means between the step heghts. The step-functon mappng of p assgns the quantzed value s, choosng such that t 1 p t. The two tested quantzaton scales are: Table 1. Sgmod "635" Scale Quantzaton Step "S" () Values s 4 s 3 s 2 s 1 s 1 s 2 s 3 s and Step Threshold "T" Values t 3 t 2 t 1 0 t 1 t 2 t Table 2. Sgmod "762" Scale Quantzaton Step "S" () Values s 4 s 3 s 2 s 1 s 1 s 2 s 3 s Step Threshold "T" Values t 3 t 2 t 1 0 t 1 t 2 t Notce how, for both scales, the precson s concentrated n the regons of greatest certanty; the step functons have nely spaced steps at the two extremes. These famles of quantzatons suggest an mplementaton strategy for varyng the decoder precson; such a strategy could compete wth other adaptve error correcton technologes that have been developed (rate compatble codes, etc.). The two quantzatons tested dffer only n how the x values of the sgmod S(x) are selected. B. Decoder Performance We found that one of our quantzaton scales was better for lower condtons and the other was better for hgh condtons. A decoder ntended to work well for a wde range of condtons mght be desgned to adapt ts quantzaton as the channel condtons change. As channel condtons change, the current nose level could be estmated from the sample varance; we haven't yet bult the logc needed to do ths, but we understand t to be a common practce n sgnal processng. The SPA smulaton results are shown n gures 8 (BER) and 9 (FER). The graphs show comparable results from a Fg. 8. Sgmod "635" 10 ters Sgmod "762" 10 ters Planjery Propretary 100 ters Fg. 9. BER for Sgmod "635" and "762", compared wth Planjery et al Sgmod "635" 10 ters Sgmod "762" 10 ters Planjery Propretary 100 ters FER for Sgmod "635" and "762", compared wth Planjery et al smulaton by Planjery, usng ther propretary three-bt decoder upon our own length 1162-bt LDPC code. The small vertcal bars on the graph data ponts show the upper end of a 95% condence nterval for each of our smulaton result values. These condence ntervals can be reduced wth longer smulatons (more samples). The condence ntervals that we present are small enough to rmly assert the followng clams: The "635" quantzaton outperforms the "762" quantzaton over the [1.0,3.5] range. The "762" quantzaton outperforms the "635" quantzaton over the [4.0,5.0] range. At the 10 4 BER level, usng our chosen rate- 1 2 LDPC code, both of our decoder quantzatons outperform the Planjery and Vasc propretary algorthm. The best BER gan s about 0.9 db better than ther approach. FER gans, somewhat less substantal, are also seen over most of the tested regon. A decoder adaptng between our two quantzatons outperforms ther approach over the entre tested range.
5 Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA C. Synthess Results In our quantzaton approach, as descrbed above, lmted precson s appled to the recever samplng and to the varable node updates. Usng ths, we mplemented a combned party check and varable node update calculaton usng a mxture of calculatons, logc, and a table lookup. The three-bt (6,3) party check results n one of 112 possble output values, far less than the 2 (35) nput combnatons. Another way to express ths s as an mpled quantzaton - the party check output can be dgtally represented usng seven bts, snce 112 < 2 7. The table lookup determnes an update by specfyng = three-bt values. There are addtonal symmetres whch make t unnecessary to store ths many computed table values. Our technque for ndng the smplcatons was to allow the Altera Quartus II synthess tool to do the smplfyng for us. For our tested quantzatons, the tool consstently dgested the table lookup (speced n Verlog HDL) and produced a result wth a complexty reduced by a factor of about The cost for each was an overnght, (8 1 2 ) hour, synthess, place, and routng run. The synthess returns the number of logc elements (LE), whch are requred for the desgn and t computes, after placng and routng n an optmal manner, the longest path delay (LPD) between any par among the nputs and outputs. The nverse of the LPD s the hghest approprate clock frequency for a clock-synchronous desgn. The logc for calculatng one varable update usng two assocated party checks, syntheszed to less than 5,000 logc elements. When the expressed desgn was expanded to nclude all three assocated party checks and compute all three of the resultng varable node updates, the desgn footprnt more than doubled, but t dd not trple. The delay ncreased by less than 20%. The three-message logc syntheszed to a blend of shared computaton and parallelsm. Table 3. Synthess Results for each Quantzaton msgs LEs LPD max clk (ns) (MHz) Planjery's algorthm Sgmod "635" Scale 1 4, , Sgmod "762" Scale 1 4, , The chosen Cyclone II FPGA s too small to handle the 1162 replcatons of ths desgn needed to process all of the bts of a code word smultaneously. A table lookup mplementaton s a good canddate for ppelnng so a fast full-codeword desgn s entrely feasble. Our syntheszed desgn has twce the per-teraton latency of Planjery's publshed desgn (per our synthess results). Ths computed factor of only two may be an overoptmstc comparson because some of both delays may be due to the overhead of drectng nput to and recevng output from the FPGA chp tself. To obtan a farer comparson usng these sngle teraton synthess gures, we would omt some nput/output porton of the latency from the per teraton measure. We determned an upper bound for ths contrbuton by mplementng a very mnmalst crcut, just an XOR of all of the nputs that also drves all of our outputs. That crcut, wth three-bt nputs consumed 39 logc elements and had a latency of ns. If we subtract off ths latency tme value from both longest path gures, then the Planjery/Vasc adds ns to ths mnmal latency (to get the ns total) and the "0.762" Sgmod adds to ths mnmal latency. The rato of these two tme duratons s approxmately eght to one. Snce our decoder exceeds, n 10 teratons, the decodng gan of Planjery's propretary decoder wth 100 teratons, we compute the total decodng tme for one bt to be 1024:925 = 249:3 ns for our desgn and 1002:929 = 292:9 ns for Planjery's publshed desgn. The tmng advantage of our Sgmod decoder s 15%. The logc crcuty of our decoder, wth ts quantzatons, was larger than the logc to mplement ther decoder, but our decodng operaton was faster and obtans better decodng results for the tested regons of, BER and FER. Our computaton for one code symbol ts wthn the selected FPGA; we could readly use ths to decode a full codeword n a seral fashon. Alternatvely, we could ncrease throughput by usng a larger chp or by redesgnng for an ASIC. Usng a larger chp would gve us greater throughput and parallelzaton opportuntes; these can be explored more thoroughly under the engneerng constrants of a specc applcaton. D. Further Work Wth longer smulatons we may determne how far down these performance curves go; explore more thoroughly the possble error oors of our approach and determne whch of the approaches pushes down the error oor more. We have an alternatve to longer and expensve smulatons va our ongong work n the mportance samplng technques that can be used to approxmate smulatons of very low error rate condtons. We prevously studed the effect of varyng the number of decodng teratons wth ths partcular permutaton-based LDPC code; we found that a decodng by 10 teratons was usually conclusve . Smulatons of our new quantzed desgn n ths paper wth 100 teratons (nstead of just 10) resulted n only mnor addtonal gans ( 1 4 db n terms of BER and 1 3 db gan per the FER curves). It bears mentonng that our smulaton has the exblty to use dfferent quantzatons at each teraton. We have expermented wth ths capablty but we are wthout conclusve results. V. COMPARING DECODERS Our results, usng three-bt samples from a Gaussan channel, have 0.5 to 0.9 db better gan than the hard-decson recever approach used by Planjery et al. A concluson from ths s that a recever that can sample ncomng symbols wth three bts s better than one that makes a hard-decson.
6 Proceedngs of the World Congress on Engneerng and Computer Scence 2011 Vol II, October 19-21, 2011, San Francsco, USA The delty avalable at the recever samplng pont should not be dscarded. The quantzaton selected for three-bts of precson does make a dfference and consderng the channel condtons s mportant when tryng to choose the best possble quantzaton. Because we found that one of our quantzatons was better n the lower range and the other was better n the hgher range, we proposed a decoder that adapts between our two quantzatons accordng to a frequent estmaton of the channel condtons The 33,216 LE capacty of our FPGA could accommodate the logc of both of our quantzatons, leavng enough addtonal room for the logc to measure the channel and select the quantzaton adaptvely. The adaptve decoder can beat Planjery's decoder by approxmately 0.9 db over a substantal BER range (10 2 to 10 7 ). Although the sngle teraton latency s greater than that of the Planjery et al desgn, our success wth 10 teratons means that a decoder soluton that s better for a range of condtons can be reached n less tme. We beleve there s a potental for parallelzaton and ppelnng, but even workng through the bts one at a tme n a seral fashon, the 430 ns per bt processng would support a decodng throughput over 2 Mbps. Ths FPGA-based capablty s adequate to fulll the dverse narrowband requrements and acheve the lower threshold for wdeband operaton of a contemporary rado system. Our synthess assessment s of Planjery's publshed desgn. We make two assumptons n order to compare our decoder to ther propretary desgn: (1) that the propretary enhancements ncrease latency beyond that of the publshed desgn and (2) that the propretary desgn requres addtonal logc. The comparson favors our decoder on two of three evaluaton crtera. The comparson s summarzed n the followng table. Table 4. Implementaton Comparson Our Planjery's desgns desgn pub. prop. Decode 1 Bt (ns) BER (db) Chp Area (LEs) 21, REFERENCES  T. Zhang, K.K. Parh, A 54 Mbps (3,6)-regular FPGA LDPC decoder. IEEE Workshop on Sgnal Processng Systems 2002 (SIPS '02), pages , Oct 2002  S.K. Planjery, S.K. Chlappagar, B. Vasc, D. Declercq, L. Danjean, Iteratve Decodng Beyond Belef Propagaton. Informaton Theory and Applcatons Workshp (ITA), pages 1 10, 2010  Z. Zhang, L. Dolecek, B. Nkolc, V. Anantharam, M. Wanwrght, Desgn of LDPC decoders for mproved low error rate performance: quantzaton and algorthm choces. Communcatons, IEEE Transactons on Volume: 57, Issue: 11, pages , 2009  M.E. O'Sullvan, R. Smarandache, Hgh-rate, short length, (3,3s)- regular LDPC of grth 6 and 8. Informaton Theory, Proceedngs, IEEE Internatonal Symposum on, page 59, 2003  M. Greferath, M.E. O'Sullvan, R. Smarandache, Constructon of good LDPC codes usng dlaton matrces. Informaton Theory, Proceedngs, Internatonal Symposum on, page 235, 2004  Y. Chen, K.K. Parh, Overlapped Message Passng for Quas-Cyclc Low-Densty Party Check Codes. IEEE Transactons on Crcuts and Systems, v. 51 no. 6, 2004  M.M. Mansour, N.R. Shanbhag, Low-power VLSI decoder archtectures for LDPC codes. Proceedngs of the 2002 nternatonal symposum on Low power electroncs and desgn (ISLPED), pages , August 2002  R. Moberly, M. O'Sullvan, Representng probabltes wth lmted precson for teratve soft-decson LDPC decodng Wreless Personal Multmeda Conference, September 2006  R. Moberly, M. O'Sullvan, Computatonal performance of varous formulatons of the teratve soft-decson decoder algorthm IEEE Internatonal Symposum on Informaton Theory, pages , July 2006  J.Pearl. Probablstc Reasonng n Intellgent Systems - Networks of Plausble Inference. Morgan Kaufmann, 1988  B. Levne, R.R. Taylor, H. Schmt, Implementaton of near Shannon lmt error-correctng codes usng recongurable hardware IEEE Symposum on Feld-Programmable Custom Computng Machnes, pages , Aprl 2000  D. Davey, M.C. MacKay, Low-densty party check codes over GF(q). IEEE Communcatons Letters, 2: , June 1998  A. Jmenez, K.Sh. Zgangrov, Perodc tme-varyng convolutonal codes wth low-densty party-check matrces. Proceedngs 1998 IEEE Internatonal Symposum on Informaton Theory, page 305, Aug 1998  M. Gokhale and P. Graham. Recongurable Computng : Acceleratng Computaton wth Feld-Programmable Gate Arrays, Chapters 1-4, Sprnger, Dordrecht, 2005  M. Flynn, S.F. Oberman, Advanced Computer Arthmetc Desgn, Chapter 2, John Wley and Sons Inc., New York, 2001  B. Parham, Computer Arthmetc - Algorthms and Hardware Desgns, Chapters 1, 3, and 18, Oxford Unversty Press, New York, 2000  I. Koren, Computer Arthmetc Algorthms, Chapter 6, A.K. Peters Ltd., Natck, 2002  M. Ercegovac, T. Lang, Dgtal Arthmetc, Chapter 8, Morgan Kaufmann, San Francsco, 2004  J.H. Han, M.H. Sunwoo, Smpled sum-product algorthm usng pecewse lnear functon approxmaton for low complexty LDPC decodng. ICUIMC '09: Proceedngs of the 3rd Internatonal Conference on Ubqutous Informaton Management and Communcaton, pages , February 2009  U.S. Department of Defense Jont Requrements Oversght Councl Jont Tactcal Rado System (JTRS) Operatonal Requrements Document (ORD). Aprl 2003, avalable at 101/sys/land/docs/jtr23_mar.htm