Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

Similar documents
VHDL IMPLEMENTATION OF TURBO ENCODER AND DECODER USING LOG-MAP BASED ITERATIVE DECODING

Implementation of a turbo codes test bed in the Simulink environment

REDUCED-COMPLEXITY DECODING FOR CONCATENATED CODES BASED ON RECTANGULAR PARITY-CHECK CODES AND TURBO CODES

Optimum Frame Synchronization for Preamble-less Packet Transmission of Turbo Codes

A Robust Turbo Codec Design for Satellite Communications

Analog Sliding Window Decoder Core for Mixed Signal Turbo Decoder

Performance Study of Turbo Code with Interleaver Design

On the design of turbo codes with convolutional interleavers

Adaptive decoding of convolutional codes

Part 2.4 Turbo codes. p. 1. ELEC 7073 Digital Communications III, Dept. of E.E.E., HKU

Design and Implementation of Encoder and Decoder for SCCPM System Based on DSP Xuebao Wang1, a, Jun Gao1, b and Gaoqi Dou1, c

of 64 rows by 32 columns), each bit of range i of the synchronization word is combined with the last bit of row i.

Hardware Implementation of Viterbi Decoder for Wireless Applications

HYBRID CONCATENATED CONVOLUTIONAL CODES FOR DEEP SPACE MISSION

CCSDS TELEMETRY CHANNEL CODING: THE TURBO CODING OPTION. Gian Paolo Calzolari #, Enrico Vassallo #, Sandi Habinc * ABSTRACT

Performance Improvement of AMBE 3600 bps Vocoder with Improved FEC

Higher-Order Modulation and Turbo Coding Options for the CDM-600 Satellite Modem

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

A Novel Turbo Codec Encoding and Decoding Mechanism

IMPROVING TURBO CODES THROUGH CODE DESIGN AND HYBRID ARQ

Review paper on study of various Interleavers and their significance

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

NUMEROUS elaborate attempts have been made in the

Investigation of the Effectiveness of Turbo Code in Wireless System over Rician Channel

Implementation of CRC and Viterbi algorithm on FPGA

IMPLEMENTATION ISSUES OF TURBO SYNCHRONIZATION WITH DUO-BINARY TURBO DECODING

A 13.3-Mb/s 0.35-m CMOS Analog Turbo Decoder IC With a Configurable Interleaver

Design Project: Designing a Viterbi Decoder (PART I)

Decoder Assisted Channel Estimation and Frame Synchronization

Interleaver Design for Turbo Codes

EFFECT OF THE INTERLEAVER TYPES ON THE PERFORMANCE OF THE PARALLEL CONCATENATION CONVOLUTIONAL CODES

Turbo Decoding for Partial Response Channels

Implementation and performance analysis of convolution error correcting codes with code rate=1/2.

AN UNEQUAL ERROR PROTECTION SCHEME FOR MULTIPLE INPUT MULTIPLE OUTPUT SYSTEMS. M. Farooq Sabir, Robert W. Heath and Alan C. Bovik

VITERBI DECODER FOR NASA S SPACE SHUTTLE S TELEMETRY DATA

On Turbo Code Decoder Performance in Optical-Fiber Communication Systems With Dominating ASE Noise

An Efficient Viterbi Decoder Architecture

TERRESTRIAL broadcasting of digital television (DTV)

Analysis of Various Puncturing Patterns and Code Rates: Turbo Code

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

An Implementation of a Forward Error Correction Technique using Convolution Encoding with Viterbi Decoding

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

The Design of Efficient Viterbi Decoder and Realization by FPGA

ITERATIVE DECODING FOR DIGITAL RECORDING SYSTEMS

Design And Implementation Of Coding Techniques For Communication Systems Using Viterbi Algorithm * V S Lakshmi Priya 1 Duggirala Ramakrishna Rao 2

BER Performance Comparison of HOVA and SOVA in AWGN Channel

SDR Implementation of Convolutional Encoder and Viterbi Decoder

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Frame Synchronization in Digital Communication Systems

Frame Processing Time Deviations in Video Processors

Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

PCD04C CCSDS Turbo and Viterbi Decoder. Small World Communications. PCD04C Features. Introduction. 5 January 2018 (Version 1.57) Product Specification

THE USE OF forward error correction (FEC) in optical networks

Exploiting A New Turbo Decoder Technique For High Performance LTE In Wireless Communication

COM-7003SOFT Turbo code encoder/decoder VHDL source code overview / IP core

Commsonic. (Tail-biting) Viterbi Decoder CMS0008. Contact information. Advanced Tail-Biting Architecture yields high coding gain and low delay.

Area-efficient high-throughput parallel scramblers using generalized algorithms

Implementation of Memory Based Multiplication Using Micro wind Software

Performance Enhancement of Closed Loop Power Control In Ds-CDMA

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders

ALONG with the progressive device scaling, semiconductor

High Speed Optical Networking: Task 3 FEC Coding, Channel Models, and Evaluations

Transmission Strategies for 10GBase-T over CAT- 6 Copper Wiring. IEEE Meeting November 2003

SPACOMM 2013 : The Fifth International Conference on Advances in Satellite and Space Communications. Standard

VA08V Multi State Viterbi Decoder. Small World Communications. VA08V Features. Introduction. Signal Descriptions

Design of Low Power Efficient Viterbi Decoder

BER MEASUREMENT IN THE NOISY CHANNEL

Wyner-Ziv Coding of Motion Video

FPGA Implementation of Viterbi Decoder

Analysis of Video Transmission over Lossy Channels

Performance Analysis of Convolutional Encoder and Viterbi Decoder Using FPGA

THIRD generation telephones require a lot of processing

Unequal Error Protection of Embedded Video Bitstreams

CHAPTER 2 SUBCHANNEL POWER CONTROL THROUGH WEIGHTING COEFFICIENT METHOD

FPGA Implementaion of Soft Decision Viterbi Decoder

No title. Matthieu Arzel, Fabrice Seguin, Cyril Lahuec, Michel Jezequel. HAL Id: hal

Retiming Sequential Circuits for Low Power

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

CONVOLUTIONAL CODING

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Understanding ATSC Mobile DTV Physical Layer Whitepaper

Error Performance Analysis of a Concatenated Coding Scheme with 64/256-QAM Trellis Coded Modulation for the North American Cable Modem Standard

Clause 74 FEC and MLD Interactions. Magesh Valliappan Broadcom Mark Gustlin - Cisco

On the Complexity-Performance Trade-off in Code-Aided Frame Synchronization

Viterbi Decoder User Guide

Transmission System for ISDB-S

Sharif University of Technology. SoC: Introduction

Code-aided Frame Synchronization

IC Design of a New Decision Device for Analog Viterbi Decoder

Brian Holden Kandou Bus, S.A. IEEE GE Study Group September 2, 2013 York, United Kingdom

Optimization of Multi-Channel BCH. Error Decoding for Common Cases. Russell Dill

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

FPGA Implementation OF Reed Solomon Encoder and Decoder

CRC and Conv. Concatenated Channel Coder. Block. Input. Source Coder. Moldulation. Interleaver. Image. Channel. Block. List Viterbi Channel Decoder

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March-2015 ISSN DESIGN OF MB-OFDM SYSTEM USING HDL

Low Power Viterbi Decoder Designs

Technical report on validation of error models for n.

Transcription:

Performance of a ow-complexity Turbo Decoder and its Implementation on a ow-cost, 6-Bit Fixed-Point DSP Ken Gracie, Stewart Crozier, Andrew Hunt, John odge Communications Research Centre 370 Carling Avenue, P.O. Box 490 Station H, Ottawa, Ontario, KH 8S Phone : (63) 990-5846, FAX : (63) 990-6339 E-mail : ken.gracie@crc.ca Abstract This paper presents the bit error rate, packet error rate, and throughput performance of a turbo decoder implemented on the Analog Devices ADSP-8, a 6-bit fixed-point digital signal processing (DSP) chip. A simplified decoding algorithm is described, and example performance is given for block sizes between 64 and 5 information bits with a number of different code rates. Some implementation issues are also discussed..0 Introduction Recent years have seen considerable interest in turbo codes as an effective method of performing error correction in communications systems [,,3]. While nominally very computationally complex, key optimizations can be exploited to dramatically reduce the amount of processing required. This allows for the use of low-cost DSP chips as decoding engines for this powerful class of error-correcting codes. A very efficient turbo decoder using the Analog Devices ADSP-8, a 6-bit fixed-point processor, has been developed to demonstrate this fact. The decoding algorithm used is a form of iterative a posteriori probability (APP) decoding, also referred to as maximum a posteriori (MAP) decoding in the literature, implemented in the log domain [3,4,5,6,7]. The APP algorithm finds an estimate of the probability that the information bit is a 0 (or equivalently a ) at each bit time given the entire received signal [], in contrast to the Viterbi algorithm which performs maximum likelihood sequence estimation (MSE) [8]. The APP or MAP decoding method naturally lends itself to providing the soft estimates needed for iterative decoding. The decoder is implemented on its own separate processor and transfers data packets via a serial port. The received data is double-buffered to ensure that maximum throughput can be achieved. For 4 full decoding iterations, throughput on a 40 MIPS version of the ADSP-8 was found to be approximately 6.8 kbps for all of the block sizes that were considered. The paper presents bit and packet error rate results for block sizes between 64 and 5 information bits. The effect on performance of block size, code rate, and number of

iterations is illustrated. These findings are compared to the performance of K=7 and K=9 Viterbi decoders, which have also been implemented on the same platform. The complexity of the K=9 Viterbi decoder is comparable to that of the turbo decoder implementation performing 4 decoding iterations. Using a 6-bit fixed-point processor means that normalization and precision become important issues. It has been found that performance is essentially identical to that of a 3-bit C simulation when as few as 9 bits are used to represent the channel samples. Section contains a brief description of the decoding algorithm, including the structure of both the encoder and decoder. The description of the decoder includes a summary of the max-log-map algorithm used in each constituent decoder as well as a discussion of implementation issues. Section 3 presents bit and packet error performance as well as throughput results. Section 4 gives the conclusions..0 The Turbo Codec The structure of the turbo encoder is shown in Figure. Two K=5, rate / recursive systematic convolutional (RSC) encoders are used in parallel, one encoding the information bits directly and the other encoding an interleaved version of the information bits. The parity produced by these two encoders is punctured to achieve the desired overall code rate. A set of K- termination bits is used to return RSC to the all-zeroes state at the end of each data block. These termination bits are also interleaved and encoded by RSC. Note that because of the interleaver, RSC terminates in an arbitrary state. d k d k RSC RSC c k c k Interleaver Puncturing To Modulator Figure : The turbo encoder with punctured parity. p k A well-known approach to turbo decoding which makes use of APP or MAP decoding is shown in Figure. X is the set of systematic channel samples, Y is the set of unpunctured parity samples corresponding to RSC, and Y is the set of unpunctured parity samples corresponding to RSC. The constituent decoders are implemented in the log domain and utilize the log-map algorithm [3,4]. The first decoder attempts to improve the systematic bit estimates with the additional information contained in Y, while the second decoder attempts the same task with the additional information contained in Y. The improved estimates produced by each decoder are the log-

likelihood ratios (R s) and may be thought of as the sum of the systematic input ( in ) and the so-called extrinsic information ( ex ) obtained from the parity samples [,,3]. With this approach, the input channel samples must be scaled by the channel reliability factor c, which is a function of the channel signal-to-noise ratio. cx cy cy or og- og- De- MAP in out Int MAP in out Int + Dec Dec - - + + - - + ex Delay ex ex Delay ex (old) (old) Figure : Turbo decoder using two log-map component decoders. Figure 3 shows a modified turbo decoder structure used in this implementation. The max-log-map algorithm, described in the next section, is used as the constituent decoder and a correction operation is performed on both the extrinsic information and the constituent decoder output. Note that channel reliability factor c no longer needs to be estimated or applied, since the max-log-map decoder is not sensitive to scale factors. With appropriate correction, this modified decoder structure has been found to give performance within approximately 0. db of the structure shown in Figure, for the same number of iterations. An additional half-iteration can often more than compensate for this difference in performance. X Y out cor or max-log in MAP Int. Dec Cor. + + - - Y in max-log MAP Dec out Cor. cor De- Int. ex (old) Delay ex ex (old) Delay ex Figure 3: Modified turbo decoder using two max-log-map component decoders with corrections.. The Max-og-MAP Algorithm The max-log-map algorithm [3,4,7,9] calculates the R s according to m 0, m f ( 0, m) m, m f (, m) = max[ A + D + B ] max[ A + D + B ] () k m k k k+ k k k+ m

where k is the time index, m is the present state (m = 0,, M-), f(d,m) is the forward or next state given present state m and input bit d={0,}, A m k is the forward state metric for state m, B m dm, k is the reverse or backward state metric for state m, and D k is the branch metric given present state m and input bit d={0,}. The metrics are calculated according to : m b( 0, m) 0, b( 0, m) b(, m), b(, m) Ak = max[ Ak + Dk, Ak + Dk ] () m 0, m f ( 0, m), m f (, m) Bk = max[ Dk + Bk +, Dk + Bk + ] (3) dm, dm, Dk = ( xkd + ykc ) (4) where b(d,m) is the previous or backward state given present state m and previous input bit d={0,}, x k is the k th systematic sample, y k is the k th parity sample, d is a systematic bit, c d,m is the corresponding coded bit given state m and information bit d. For binary antipodal signalling, the corresponding transmit symbols are given by d = d and dm, dm, c = c. The state metrics give a measure of the likelihood that state m was the correct encoder state at time k, while the branch metrics measure the likelihood of the transmitted bits at time k given the received signal samples. The forward state metrics are calculated starting at the beginning of the block (time 0) and working towards the end of the block (Equation ()). The backward state metrics are calculated in the opposite direction, starting with the samples at the end the data block and working back towards the beginning (Equation (3)). A number of observations may be made about the algorithm. First, it is clear that the use of the max approximation means that this method is not optimum with regard to MAP decoding. Second, as mentioned above, this particular method does not require an estimate of the channel signal-to-noise ratio. Third, it is significant that this method is very similar to a standard Viterbi algorithm without history [3,4,7,9]. In fact, this algorithm will find the same maximum likelihood path through the trellis as the Viterbi algorithm while producing soft outputs that can be used in successive decoding stages.. Implementation Issues The codec is implemented on a pair of Analog Devices EZ-KIT development systems, each featuring an ADSP-8 DSP chip. One acts as a general purpose channel simulator: random information bits are generated and encoded, an AWGN channel is simulated, the resulting channel samples are sent to the decoder via a serial port, and decisions from the decoder are compared with the original bits in order to compile error rate statistics. The second board acts as the decoder, taking in noisy samples and producing bit estimates. This test bed is flexible, low-cost, and illustrates the practicality of turbo codes. The fact that the ADSP-8 is a 6-bit fixed-point processor combined with the fact that the turbo decoder is iterative means that precision and normalization are important issues.

In particular, both the systematic samples themselves and the state metrics must be regularly normalized in order to prevent overflow. Note that the two requirements are related, since the magnitude of the state metrics is a function of the input signal strength. Block normalization is used to prevent overflow of the systematic R values. That is, the current block of systematic samples are periodically scaled down such that the largest sample in the block does not exceed some predefined level. In the current decoder implementation, block normalization is performed after the extrinsic information has been subtracted, just before the samples are passed to the max-log-map decoder. This guarantees the desired signal level at the input to the max-log-map decoder. It is assumed that these samples have not overflowed since the last block normalization. The maximum tolerable signal level was determined empirically, and involved a tradeoff between precision and probability of overflow. It was found that 9 bits of precision (8 magnitude, sign) satisfied these requirements, as witnessed to by the fact that the fixedpoint implementation was able to match C simulation results. The state metrics themselves are periodically normalized by subtracting an arbitrary metric from the current set of M state metrics. It was found that with the input signal bounded by 9 bits, a normalization period of once every 3 input samples was sufficient to prevent the state metrics from overflowing. A final point regarding normalization is that all of the sets of data in the turbo decoder must be adjusted in the same manner as the systematic samples. That is, the same scaling that is applied to the systematic data must also be applied to the parity samples and to the appropriate set of extrinsic samples. The relative confidence that is placed in the various data sets is therefore maintained as the R s grow. 3.0 Performance This section discusses the performance of the turbo decoder. The encoder used the structure shown in Figure, and each constituent RSC encoder used the TURBO4 polynomials given in [0], namely (3 8, 35 8 ). The performance results given here were gathered from a fixed-point C simulation, but the performance of the ADSP-8 implementation was found to be virtually identical for all of the block sizes that were compared. Figure 4 and Figure 5 show bit error rate (BER) and packet error rate (PER) performance for nominal rate / coding and block sizes of 64, 8, 56, and 5 information bits. Note that the code rate is not exactly / due to the presence of 4 termination bits; the actual code rates for each block size are 0.47, 0.485, 0.49, and 0.496, respectively. As expected, increasing the number of iterations of the turbo decoder from 4 to 8 leads to an improvement in performance, approximately 0.dB at a BER of 0-4 and approximately 0.dB at a PER of 0-3. The effect of block size is apparent, and shows the advantage of using turbo codes with larger blocks. Figure 6 and Figure 7 show the BER and PER performance for a block size of 5 information bits and nominal code rates of /3, /, /3, 3/4. Again, the different code

0-0 - Bit Error Rate 0-3 0-4 iterations: 0-5 info/block: 5 56 8 64 0-6.5.5 3 3.5 4 4.5 Figure 4: BER performance for several block sizes and rate / coding. 0 0 0 - Packet Error Rate 0-0 -3 iterations: 0-4 info/block: 5 56 8 64 0-5.5.5 3 3.5 4 4.5 Figure 5: PER performance for several block sizes and rate / coding.

0-0 - Bit Error Rate 0-3 0-4 iterations 0-5 code rate /3 / /3 3/4 0-6 0 0.5.5.5 3 3.5 4 4.5 Figure 6: BER performance for a block size of 5 information bits and several code rates. 0 0 0 - Packet Error Rate 0-0 -3 iterations 0-4 code rate /3 / /3 3/4 0-5 0 0.5.5.5 3 3.5 4 4.5 Figure 7: PER performance for a block size of 5 information bits and several code rates.

rates were achieved by puncturing the parity bits produced by the turbo encoder and are reduced slightly by the addition of 4 termination bits. The actual code rates are 0.33, 0.496, 0.66, and 0.744, respectively. Figure 8 and Figure 9 show the BER and PER performance of the turbo decoder against that of a flushed Viterbi decoder with K=7 and K=9 for a block size of 8 information bits. It can be seen that both the turbo decoder and the K=9 Viterbi decoder yield the same performance at a PER of approximately 0 -. In general, the turbo decoder performs better for higher signal-to-noise ratios while either Viterbi decoder performs better for lower signal-to-noise ratios. Even for this relatively small block size, it is apparent that the turbo decoder gives very competitive error rate performance. It is also interesting to note that the turbo decoder with 4 iterations and the K=9 Viterbi decoder have comparable complexities, based on DSP implementations and measured throughputs. Iterations Achieved Throughput on a 40 MIPS ADSP-8 (kbps) Projected Throughput on a 40 MIPS ADSP-8 (kbps) 4 6.8 0 8 8.5 0 Table : Approximate throughput values, achieved and projected. Table shows both achieved and projected throughput values for a 40 MIPS version of the ADSP-8. The turbo decoder with 4 iterations achieved 6.8 kbps. More recent work with the ADSP-06x SHARC, a 3-bit device, has resulted in a decoder able to perform 4 iterations at a speed of 48 kbps. The projected throughputs shown in Table are based upon incorporating algorithmic enhancements already present in the SHARC implementation into the ADSP-8 implementation. The current ADSP-8 implementation is able to accommodate a maximum block size of 650 information bits. This limit is dictated by the fact that only 3K of on-chip memory is available (6K data and 6K program). It is expected that block sizes up to about 000 information bits could be accommodated with overlapped sub-block processing, though this would lead to a slight reduction in throughput. A search for suitable interleavers to use with these block sizes was also done. While this search was not exhaustive, many different interleavers were tested and the results shown above were gathered with those that gave the best performance. 4.0 Conclusions The structure and performance of a modified, low-complexity turbo decoder was presented. Performance results showed the effectiveness of turbo codes for large data blocks. Comparisons were drawn between the turbo decoder and Viterbi decoders of comparable complexity, showing that turbo codes display competitive performance even

0-0 - Bit Error Rate 0-3 0-4 Viterbi Turbo 0-5 K=9 K=7 8 4 0-6.5.5 3 3.5 4 4.5 Figure 8: BER performance of the turbo decoder versus that of a standard zero-flushed Viterbi decoder with K=7 and K=9 (8 information bits). 0 0 0 - Packet Error Rate 0-0 -3 Viterbi 0-4 Turbo K=9 K=7 0-5 iterations.5.5 3 3.5 4 4.5 Figure 9: PER performance of the turbo decoder versus that of a standard zero-flushed Viterbi decoder with K=7 and K=9 (8 information bits).

for relatively short data blocks. A version of this decoder implemented on the Analog Devices ADSP-8, a low-cost, 6-bit fixed-point DSP chip, was also described. Throughput for 4 iterations of the turbo decoder implemented on a 40 MIPS ADSP-8 processor was found to be 6.8 kbps. References [] C. Berrou and A. Glavieux, Near Optimum Error Correcting Coding and Decoding : Turbo-Codes, IEEE Transactions on Communications, Vol.44, No.0, October 996. [] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon imit Error- Correcting Coding and Decoding: Turbo-Codes, Proceedings of ICC 93, Geneva, Switzerland, pp. 064-070, May, 993. [3] P. Robertson, P. Hoeher, and E. Villebrun, Optimal and Sub-Optimal Maximum a Posteriori Algorithms Suitable for Turbo Decoding, IEEE Communications Theory, Vol. 8, No., March-April 997. [4] P. Robertson, E. Villebrun, and P. Hoeher, A Comparison of Optimal and Sub- Optimal MAP Decoding Algorithms Operating in the og Domain, Proceedings of ICC 95, Seattle, pp. 009-03, June 995. [5]. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal Decoding of inear Codes for Minimizing Symbol Error Rate, IEEE Trans. on Inform. Theory, Vol. IT-0, pp. 84-87, March 974. [6] J. Hagenauer, E. Offer, and. Papke, Iterative Decoding of Binary Block and Convolutional Codes, IEEE Trans. on Inform Theory, Vol. 4, No., pp. 49-445, March 996. [7] J. Erfanian, S. Pasupathy, G. Gulak, Reduced Complexity Symbol Detectors with Parallel Structures for ISI Channels, IEEE Trans. on Communications, Vol. 4, No. /3/4, pp.66-67, February/March/April 994. [8] G. Forney, The Viterbi Algorithm, Proceedings of the IEEE, Vol.6, No.3, pp. 68-78, March 973. [9] S. Pietrobon, Implementation and Performance of a Turbo/MAP Decoder, Submitted to the International Journal of Satellite Communications, February, 997. [0] B. Talibart and C. Berrou, Notice Preliminaire du Circuit Turbo-Codeur/Decodeur TURBO4, Version 0.0, June, 995.