IMPROVING TURBO CODES THROUGH CODE DESIGN AND HYBRID ARQ

IMPROVING TURBO CODES THROUGH CODE DESIGN AND HYBRID ARQ By HAN JO KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

I dedicate my dissertation to all of my family members... father Muho, mother Hwabong, sister Suyeon, brother Hanjin

ACKNOWLEDGMENTS It s been hard, tough seven years in Gainesville. But I was able to survive and now I finally graduate. I want to thank Dr. John Shea, Dr. Tan Wong, and all of my committee members for their academic advice. Especially, I could not have produced the research described in this dissertation without Dr. Shea s mentoring and instruction. I also thank all my lab colleagues and wish them a bright future. This dissertation is a fruit of not only my research but also my family s support. My parents supported me consistently both financially and mentally throughout my life. I can not find any way to thank them too much. iv

TABLE OF CONTENTS page ACKNOWLEDGMENTS................................ iv LIST OF TABLES................................... vii LIST OF FIGURES................................... ABSTRACT....................................... viii xi CHAPTER 1 INTRODUCTION.............................. 1 1.1 Improving The performance of Turbo-Like Codes Through Code Design............................... 1 1.1.1 Turbo Codes......................... 1 1.1.2 Turbo-Like Codes...................... 3 1.1.3 Concatenated Turbo Codes................. 4 1.1.4 Analysis of Turbo Codes.................. 5 1.1.5 Design and Analysis of RPCC+Turbo Codes........ 7 1.2 Improving Turbo Codes Through Hybrid ARQ........... 7 1.2.1 Introduction to HARQ Schemes............... 7 1.2.2 HyBrid ARQ with Turbo Codes............... 8 1.2.3 Reliability-Based Hybrid ARQ............... 9 1.2.4 Reliability-Based Hybrid ARQ with Turbo Codes..... 9 1.3 Outline................................ 9 2 IMPROVING THE PERFORMANCE OF TURBO CODES THROUGH CODE DESIGN............................. 11 2.1 Introduction............................. 11 2.2 System Model............................ 12 2.3 Weight Spectrum-Based Performance Analysis........... 13 2.4 Code Design Based on Weight Spectrum-Based Analysis..... 13 2.5 Density Evolution-Based Performance Analysis.......... 20 2.6 Code Design Based on Density Evolution-Based Analysis..... 23 2.6.1 Analysis on Effect of Different Constituent Code for Inner Turbo Code........................ 24 2.6.2 Analysis on Effect of Unequal Energy for Systematic and Parity Symbols...................... 31 v

3 IMPROVING THE PERFORMANCE OF TURBO CODES THROUGH HARQ................................... 35 3.1 Introduction............................. 35 3.2 System Model............................ 37 3.3 RB-HARQ with Various Retransmission Schemes......... 42 3.3.1 RB-HARQ with Fixed Retransmission Size......... 42 3.3.2 RB-HARQ with Various Retransmission Size....... 45 3.4 HARQ with Rate Adaptation.................... 48 4 DESIGNING CONVOLUTIONAL TURBO CODES THROUGH HARQ 55 4.1 Introduction............................. 55 4.2 System Model............................ 55 4.3 CTC with Unpunctured Convolutional Codes............ 56 4.4 CTC with Punctured Convolutional Codes............. 61 4.5 CTC with Asymmetric Constituent Codes............. 65 5 CONCLUSIONS.............................. 67 REFERENCES..................................... 68 BIOGRAPHICAL SKETCH.............................. 74 vi

LIST OF TABLES Table page 3.1 Performance of various retransmission schemes................ 47 3.2 Average value of soft output at the various rates and E s /N 0.......... 50 3.3 Target rate for rate adaptation schemes..................... 50 4.1 Total decoder complexity of CTC with different constituent code....... 65 vii

LIST OF FIGURES Figure page 1.1 A rate 1/3 turbo code encoder and decoder................... 2 1.2 Regions of convergence for turbo decoder................... 3 1.3 Serial-concatenated convolutional system a)encoder b)decoder........ 4 2.1 Concatenated coding scheme with turbo inner code and rectangular parity check outer code............................... 12 2.2 Performance of typical memory-2 turbo codes and RPCC+turbo codes.... 17 2.3 Performance of typical memory-3 turbo codes and RPCC+turbo codes.... 18 2.4 Performance of typical memory-4 turbo codes and RPCC+turbo codes.... 18 2.5 Comparison of performance for best turbo codes and RPCC+turbo codes of memory 2, 3, and 4............................ 20 2.6 Iterative turbo decoder............................. 21 2.7 (5,7) turbo code density evolution....................... 22 2.8 Decoder convergence comparison between (5,7) and (15,13) turbo code... 24 2.9 Decoder of RPCC+turbo code......................... 24 2.10 RPCC+turbo decoder.............................. 25 2.11 Decoder convergence comparison between turbo code and RPCC+turbo code (constituent code (15,13))....................... 26 2.12 Decoder convergence comparison for memory-2 codes (constituent code (5,7), (7,5) and (5,3))............................. 26 2.13 Block error probability vs iterations for memory-2 codes (constituent code (5,7) and (7,5))................................ 27 2.14 Decoder convergence comparison for memory-3 codes (constituent code (15,13), (13,15), and (13,17))........................ 28 2.15 Block error probability vs iterations for memory-3 codes (constituent code (15,13) and (13,17))............................. 28 viii

2.16 Decoder convergence comparison for memory-4 codes (constituent code (21,37), (33,23), (37,22), and (33,31)).................... 29 2.17 Block error probability vs iterations for memory-4 codes (constituent code (37,22) and (33,23))............................. 29 2.18 Decoder convergence comparison for constituent code (5,7),(13,17), and (37,22).................................... 30 2.19 Block error probability vs iterations for various constituent codes...... 30 2.20 Decoder convergence comparison for first energy allocation method for different ρ (constituent code (13,17)).................... 33 2.21 Decoder convergence comparison for second energy allocation method for different ρ1 (constituent code (13,17)).................... 34 2.22 Decoder convergence comparison for ρ = 0.6 (constituent code (13,17))... 34 3.1 Probability of bit error by reliability rank for rate 1/3 RPCC+turbo code... 36 3.2 Hybrid-ARQ system.............................. 37 3.3 Estimated number of bit errors versus actual number of bit errors for a block of 2500 bits.............................. 39 3.4 Mean of the soft output versus the percentage of bit errors for a block of 2500 bits (RPCC+turbo code)........................ 40 3.5 Mean of the soft output versus the percentage of bit errors for a block of 1024 bits (Turbo code)............................ 40 3.6 Percentage of bits to request based on τ and the mean of soft output of error packet.................................. 41 3.7 Percentage of bits to request based on τ and the mean of soft output of error packet.................................. 42 3.8 Throughput for example RB-HARQ system using RPCC+turbo code..... 43 3.9 RB-HARQ performance with RPCC+turbo code, 4 retransmissions of 2.1% incremental redundancy each..................... 44 3.10 RB-HARQ performance for different retransmission schemes........ 45 3.11 RB-HARQ performance for different retransmission schemes........ 46 3.12 Throughput for conventional and RB-HARQ................. 48 3.13 Packet error probability for various RB-HARQ schemes with rate adaptation and rate adaptation scheme without RB-HARQ............ 51 ix

3.14 Total number of retransmission for various RB-HARQ schemes with rate adaptation and rate adaptation scheme without RB-HARQ......... 52 3.15 Total number of retransmitted bits for various RB-HARQ schemes with rate adaptation and rate adaptation scheme without RB-HARQ (After the first rate adaptation)........................... 52 3.16 Throughput for various RB-HARQ schemes with rate adaptation and rate adaptation scheme without RB-HARQ................... 53 4.1 CTC system................................... 56 4.2 Separate interleaving and encoding scheme.................. 57 4.3 Packet error probability for different schemes (10% retransmission)..... 58 4.4 Packet error probability for different schemes (50% retransmission)..... 59 4.5 Upper bound of packet error probability for random-harq, normal interleaving and encoding scheme..................... 60 4.6 Upper bound of packet error probability for random HARQ, separate interleaving and encoding scheme..................... 60 4.7 Performance of punctured symmetric CTC with different initial rate.... 62 4.8 Performance of punctured symmetric CTC.................. 62 4.9 Throughput efficiency of CTC and punctured turbo code with ARQ, fixed number (5) of turbo decoder iterations.................... 64 4.10 Throughput efficiency of CTC and punctured turbo code with ARQ..... 64 4.11 Performance of punctured asymmetric CTC with different constituent code. 65 x

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPROVING TURBO CODES THROUGH CODE DESIGN AND HYBRID ARQ By Han Jo Kim December 2005 Chair: John M. Shea Major Department: Electrical and Computer Engineering We consider techniques to improve the performance of coded communication systems. Turbo codes have been shown to provide performance close to the channel capacity for long block sizes. The performance of turbo codes can be improved through several different techniques including code concatenation, automatic repeat request (ARQ) and rate adaptation. We first investigate techniques to evaluate the performance of turbo-like codes and design better codes. Then, we identify techniques to further improve the performance of turbo codes through code concatenation and hybrid ARQ. The performance of serialconcatenated codes that utilize turbo code as inner codes and rectangular parity check code (RPCC) as outer codes is investigated with reliability-based hybrid ARQ (RB-HARQ). Also, we investigate the performance of turbo codes with RB-HARQ and rate adaptation. The results show that these reliability-based hybrid ARQ techniques can provide significant performance gains. Finally, we propose a new HARQ approach that has the potential to reduce decoding complexity and increase performance at high signal-to-noise ratios. xi

CHAPTER 1 INTRODUCTION In this work, we consider two techniques to improve the performance of turbo codes. In the first technique, we address the relatively high error floor of turbo codes by concatenating turbo codes with high-rate outer codes based on single parity-check codes. The second technique uses hybrid ARQ to address decoding errors from both low-weight error events and decoder nonconvergence. The hybrid ARQ techniques use reliability information generated in the turbo decoding process to adapt the retransmission. In Section 1.1, we introduce turbo codes and related concatenated coding schemes, then briefly introduce techniques that we will use in this work for the design and analysis of such codes. In Section 1.2, we introduce hybrid automatic repeat request (ARQ) schemes and previous work on applying hybrid ARQ with turbo codes. Then we briefly discuss the application of reliability-based hybrid ARQ (RB-HARQ) with turbo codes. 1.1 Improving The performance of Turbo-Like Codes Through Code Design 1.1.1 Turbo Codes Parallel-concatenated convolutional codes (PCCCs), also known as turbo codes, were first introduced by Berrou, Galvieux and Thitimajshima in 1993 [1] and have been shown to offer near-capacity performance for large block sizes. This turbo code is constructed by parallel concatenation of two or more convolutional constituent codes with an interleaver. Encoder and decoder for rate 1/3 turbo code are illustrated in Figure 1.1. One of the key advantages of turbo codes is that they can be decoded by a practical decoding scheme for which the decoding complexity only grows linearly in the length of the code. In general, the complexity of maximum-likelihood (ML) decoding grows exponentially in the length of the code. For the turbo code, ML decoding is generally not feasible, but performance close to the ML decoder is possible if the signal-to-noise ratio 1

2 a) Information Bits Encoder 1 Recursive Convolutional Code Interleaver Recursive Convolutional Code Encoder 2 b) Systematic Bits Parity 1 Decoder 1 MAP Decoder L 12 Interleaver Interleaver Decoder 2 MAP Decoder Parity 2 L 21 Deinterleaver Decision L 12 : Extrinsic Information from decoder 1 to decoder 2 L 21 : Extrinsic Information from decoder 2 to decoder 1 Figure 1.1: A rate 1/3 turbo code encoder and decoder. is high enough by using an interative maximum a posteriori (MAP) decoder, as shown in Figure 1.1b. This decoder iterates between decoders for the two constituent codes to successively refine the estimate of the codeword. Each constituent decoder is typically a variant of the BCJR MAP decoder for convolutional codes [2]. The performance of a turbo code with iterative decoding can be broken down into three regions, as illustrated in Figure 1.2. At high E b /N 0, the performance of turbo codes with iterative decoding is limited by low-weight error events [3]. This region is known as the error-floor. Common constraints in communication systems such as low-complexity constituent codes, small interleaver sizes, and a low number of decoder iterations result in higher error probabilities in the error-floor region. In the nonconvergence region, which

3 10 0 Probability of Block Error 10-1 10-2 Nonconvergence region Waterfall region Error-floor region Simulation Analysis 10-3 -1-0.5 0 0.5 1 1.5 2 2.5 3 E b / N 0 Figure 1.2: Regions of convergence for turbo decoder. occurs at very low E b /N 0, the decoder is almost never able to decode the block correctly. In the waterfall region, which lies in between the nonconvergence and error-floor regions, the decoder is only able to find the ML codeword some of the time. In the nonconvergence and waterfall regions, the performance is not limited by the code, but rather by the failure of the decoder to converge to the ML solution. This decoder failure is linked to fixed points of the decoding algorithm [4, 5]. 1.1.2 Turbo-Like Codes Besides parallel concatenation of convolutional codes, many other ways to construct turbo-like concatenated codes have been proposed. Here we refer to any concatenated code that employs iterative decoding as a turbo-like code. Block turbo codes (BTCs) and serial-concatenated convolutional codes (SCCCs) are well known examples of turbo-like codes. Block turbo codes use block codes as constituent codes instead of convolutional codes [6]. Typically the block codes are used to form a product code. However, we note that the product code construction is equivalent to a parallel concatenated code with rectangular interleaving on the information bits between two block encoders. Some of the claims [7] for BTCs are that they can be constructed with a large minimum distance

4 a) Information Bits Recursive Convolutional Code Outer Encoder Interleaver Recursive Convolutional Code Inner Encoder b) Inner MAP Decoder Extrinsic Information Interleaver Outer MAP Decoder Decision Deinterleaver Extrinsic Information Figure 1.3: Serial-concatenated convolutional system a)encoder b)decoder. from constituent codes of low complexity, and they offer a good compromise between complexity and performance. An alternative to using a parallel-concatenated code is to use a serial-concatenated code. Serial-concatenated convolutional codes (SCCCs) were proposed by Benedetto et al. [8]. The basic structure of the SCCC encoder and decoder is illustrated in Figure 1.3. The primary advantage of SCCCs is that the serial concatenation generally results in high minimum distances for the codes, and hence the slopes of the asymptotic error probabilities (error floors) are much steeper. However the convergence of the decoder for SCCCs is usually much worse than PCCCs, requiring a higher E b /N 0 to operate in the waterfall or error-floor regions. 1.1.3 Concatenated Turbo Codes Since the introduction of the original turbo code, several authors have made efforts to improve the performance of turbo codes by using a turbo code as an inner code in a serialconcatenated coding structure [9-15]. Some of the outer codes that have been considered are BCH codes [9,11-14]and Reed-Solomon (RS) codes [10,15]. The primary purpose of

5 the outer code is to correct the low-weight error events that are typically associated with the error floor, thus improving the asymptotic error performance. There are several disadvantages for using an outer code with an inner turbo code. To avoid reducing the performance of the turbo code, the outer code must have high rate. Let E b /N 0 denote the energy per information bit and E c /N 0 denote the energy per coded bit. Then E c = R c E b, where R c is the rate of the overall concatenated code. In serial concatenation of an inner code with an outer code, let the code rates for the outer and inner codes be denoted by R O and R I, respectively. When the outer code rate is R O < 1, the overall code rate will decrease and causes a decrease in the performance of the turbo code. The reduction in E c /N 0 caused by the use of the outer code is referred to as the rate penalty, which in db is given by 10 log 10 R O. We can employ high-rate BCH outer codes and Reed-Solomon outer codes, but only at the expense of large block length. The rate penalty can be reduced through selective concatenation [12, 13], which uses the outer code only to protect some set of bits that are most likely in low-weight error patterns. An additional disadvantage of these codes is that the soft-decision decoders for these codes require significantly higher complexity than typical algebraic decoders of these codes. Shea [16] proposed the use of a simple rectangular parity-check code (RPCC) as an outer code with an inner turbo code. These codes have the benefits of high rate (low rate penalty) and very simple soft-decision decoders. Shea [16] has shown that these codes have sufficient error-correction capability to dramatically reduce the error floor. More detailed discussion of these codes is deferred until Chapter 2. 1.1.4 Analysis of Turbo Codes There are two effective techniques to analyze the performance of a concatenated coding system. One is error probability analysis based on the input-output weight enumerating functions (IOWEFs) of the codes [17, 18]. Another is decoder convergence analysis using density evolution [19-21] or EXIT carts [22, 23]. These latter techniques are based on tracking the improvement of the extrinsic information across decoder iterations.

In error probability analysis, the IOWEF for the turbo code is calculated from the IOWEF for the recursive convolutional codes, which are in turn found by computer searches or transfer function techniques [18]. Then the error probabilities can be calculated using bounds. The most commonly used bound is the standard union bound. As in [18, 24], we can approximate the bit error probability by ( ) x w P b x W w 2dmin RE b Q w=1 N 0 e (d minre b /N 0 ) A P C w (Z, x) W =Z=e ( RE b /N 0 ). (1.1) 6 Here, w is the input weight, x is the information block length, d min is the minimum distance, R is the rate of the PCCC, and A P w C (Z, x) is the conditional WEF of the PCCC. The bounds generated in this way diverge around the cutoff rate. Here cutoff rate actually refers to the minimum E b /N 0 that provides a cutoff rate greater than the code rate. As the waterfall region for turbo codes may occur at E b /N 0 lower than the cutoff rate, the error probability bounds are typically only useful for predicting performance in the error floor region. Density evolution [19-21] can be used to analyze the convergence of the decoder in iterative soft-input, soft-output (SISO) decoding. In iterative decoding, the extrinsic information of the decoder improves as the iterations progress. Density evolution can measure this improvement quantitatively. In a coding system, if the interleaver is very large and random, then the extrinsic information of the bits is approximately independent and identically distributed. We define the signal-to-noise ratio of the extrinsic information as SNR = µ 2 /σ 2, where µ and σ is mean and variance of the extrinsic information. At every iteration, the input SNR of one decoder is equal to the output SNR of the other decoder. The convergence of the decoder can be analyzed graphically by plotting the input and output SNRs for the two decoders on a single graph.

7 1.1.5 Design and Analysis of RPCC+Turbo Codes We consider strategies to optimize the performance of serial-concatenated codes that use turbo inner codes and RPCC outer codes. We refer to these as RPCC+turbo codes. We first investigate the impact on the asymptotic performance (the error floor) of the choice of turbo code. Then we use density evolution to investigate the effect of different turbo codes on decoder convergence. Finally, we propose a strategy to adjust the relative energy of the systematic and parity symbols in order to shape the density evolution curves. We investigate the effects of these energy changes through density evolution and simulation. 1.2 Improving Turbo Codes Through Hybrid ARQ 1.2.1 Introduction to HARQ Schemes Forward error correction (FEC) and automatic repeat request (ARQ) are two fundamental error-control techniques used in communication systems. Both error control techniques have some drawbacks. A drawback of ARQ is that the throughput of the system decreases rapidly as the channel error rate increases. In a FEC system, it is difficult to achieve both high system reliability and high throughput. To avoid errors in decoding, a low code rate may be required, which reduces throughput. To overcome the drawbacks of these systems, hybrid ARQ (HARQ) is proposed by Burton and Sullivan [25] and Rocher and Pickholtz [26]. Hybrid ARQ is combination of the two fundamental error-control techniques. A HARQ scheme can provide higher reliability and throughput than either scheme alone [27, 28]. Hybrid ARQ schemes can be divided into three categories: type-i, type-ii and type-iii [28, 29]. These three types of HARQ are classified based on what kind of bits are requested, retransmitted and decoded. When an uncorrectable error pattern is detected in the received information, a type-i hybrid ARQ scheme discards the whole received packet and requests retransmission of the same packet until successful decoding is accomplished [25, 26]. HARQ schemes based on the Viterbi algorithm and convolutional codes with sequential decoding are introduced by Yamamoto and Itoh [30] and Drukarev and Costello Jr. [31],

8 respectively. In type-ii HARQ, when an error is detected, the original received information packet is kept and a parity packet is requested. These two packets are used to do error-correction decoding. If decoding still fails, the receiver requests either the original information or parity packet depending on the retransmission strategy. This retransmitted packet is combined with the previously received packet and then error-correction decoding is performed again. These request and combine continues until successful decoding is accomplished [28]. All received packets are used for decoding in type-ii HARQ but not in type-i HARQ. Type-II HARQ is applied to systems using Reed-Solomon (RS) and Reed- Muller (RM) by Pursley and Sandberg [32] and Wicker and Bartz [33, 34], respectively. Hybrid ARQ with rate compatible punctured codes (RCPCs) based system is introduced by Hagenauer [35]. Many type-ii HARQ schemes are developed based on incremental parity retransmission [36-38]. Type-III hybrid ARQ is a variation of type-ii HARQ [29]. In type-iii HARQ, a complementary punctured convolutional (CPC) code is used to offer redundancy for correct decoding. Every retransmission in type-iii HARQ includes both data and parity bits. Utilizing packet combining can improve the performance of type-ii and type-iii hybrid ARQ. The HARQ scheme developed in this work utilizes incremental parity retransmission and packet combining. 1.2.2 HyBrid ARQ with Turbo Codes A hybrid ARQ scheme that uses a turbo code for FEC is proposed by Narayanan [38]. The RCPC-based hybrid ARQ scheme is extended to use rate compatible turbo codes (RCPTs) by Rowitch and Milstein [39]. Optimization of the puncturing patterns for turbo codes is described by Açikel and Ryan [40]. In Manta and Dschischang [41], a parity spreading interleaver (PSI) is used to generate incremental parity, and a similar performance to optimized puncturing is achieved at much lower complexity. Using a RS-turbo serial-concatenated code with hybrid ARQ for a DS/SSMA data network is introduced by Ji and Stark [42], and turbo trellis coded modulation (TTCM) with ARQ is proposed by

9 Banerjee and Fuja [43]. A hybrid ARQ protocol based on viewing a PCCC as a punctured SCCC is given by Wu and Valenti [44]. 1.2.3 Reliability-Based Hybrid ARQ The Reliability-based hybrid ARQ (RB-HARQ) is proposed by Shea [45]. The RB- HARQ technique is motivated by an understanding of the limitations of iterative decoding and can be used with codes that employ soft-input, soft-output (SISO) decoders (cf. [46]), such as turbo codes and convolutional codes. SISO decoders typically accept estimates of the a priori probabilities and output estimates of the a posteriori probabilities (APPs) for the message bits. In RB-HARQ, the bits to be retransmitted are adaptively selected at the receiver based on the estimated APPs. The RB-HARQ technique has been shown effective in several scenarios [48-52] 1.2.4 Reliability-Based Hybrid ARQ with Turbo Codes We consider strategies to optimize the performance of hybrid ARQ with turbo codes. We investigate the impact of various retransmission schemes on the performance of RB- HARQ with RPCC+turbo codes and turbo codes with no outer code. We also discuss a technique to reduce the retransmission request overhead by using reliability information to select only the amount of information to be retransmitted. 1.3 Outline The rest of this work is organized as follows. In Chapter 2, we first investigate the impact on the asymptotic performance (the error floor) of the choice of constituent code for turbo code and RPCC+turbo codes. Then we use density evolution to investigate the effect of different turbo codes on decoder convergence. Finally, we propose a strategy to adjust the relative energy of the systematic and parity symbols in order to improve decoder convergence. We investigate the effects of these energy changes through density evolution and simulation. In Chapter 3, we consider the use of reliability-based hybrid ARQ with turbo codes. We investigate the impact of various retransmission schemes on the performance of RB-HARQ with RPCC+turbo code and turbo codes with no outer code. We compare the

10 performance of a RB-HARQ that requests specific unreliable bits to a RB-HARQ scheme that only requests a specific amount of information to be retransmitted. Finally, conclusions and future work are discussed in Chapter 5.

CHAPTER 2 IMPROVING THE PERFORMANCE OF TURBO CODES THROUGH CODE DESIGN In this chapter, we consider the analysis and design of concatenated rectangular paritycheck codes and turbo codes (RPCC+turbo codes) [16]. 2.1 Introduction Despite the near-capacity performance for large interleaver sizes, the performance of turbo codes is limited by low information-weight error events at the output of the decoder. These low-weight error events dominate the performance as the signal-to-noise ratio (SNR) increases [3] and cause an error floor. One way to reduce the loss of performance caused by these low-weight event is by selecting the interleaver carefully [3]. But even with the use of designed interleavers, the performance of turbo codes at high SNRs is dominated by these low-weight error events. As discussed in Chapter 1, another approach to solve this problem is to serially concatenate a high-rate outer code with an inner turbo code. One particularly attractive choice for the outer code is a rectangular parity check code (RPCC). This approach is motivated by the fact that the dominant error events have low-weight. By concatenating a RPCC in serial with a turbo code, the low-weight error events can be corrected, and the packet error probability can be significantly improved [52]. In this chapter, we consider strategies to optimize the performance of RPCC+turbo codes. We first investigate the impact on the asymptotic performance (the error floor) of the choice of turbo code. Then we use density evolution [19-21] to investigate the effect of different turbo codes on decoder convergence. Finally, we propose a strategy to adjust the relative energy of the systematic and parity symbols in order to shape the density evolution curves. We investigate the effects of these energy changes through density evolution and simulation. 11

12 Rectangular Parity Check Code (RPCC) Interleaver Turbo Code AWGN Channel RPCC+ Turbo decoder Figure 2.1: Concatenated coding scheme with turbo inner code and rectangular parity check outer code. 2.2 System Model Consider a packet communications system that uses the serially concatenated coding scheme that has a rectangular parity check code (RPCC) as an outer code with a rate 1/3 turbo code as the inner code. The RPCC and turbo codes are separated by an interleaver. This concatenated coding scheme is illustrated in Figure 2.1. In this scheme, the RPCC is generated by placing the information bits into a rectangular array and then calculating the even parity for both the row and the columns. For the purpose of this work, we will always assume that the packet size in information bits is the square of an integer, N 2 bits, so the array used for calculating the parity is square. Then, the number of parity bits is 2N (N bits each in the horizontal and vertical direction). So, the rate of the RPCC code is N 2 /(N 2 + 2N) = N/(N + 2). Obviously, as N becomes large, the rate of the RPCC code becomes very high. The encoded packet is transmitted over a channel using binary or quaternary phase-shift-keying (BPSK or QPSK). The received symbols are corrupted with additive white Gaussian noise (AWGN). Both an iterative turbo decoder [1, 53, 54] and an iterative RPCC decoder [2] are based on exchanging extrinsic information between MAP [2] decoders. In this work, we use an extension of the turbo decoding techniques described by Berrou [1] and the decoding techniques described by Hagenauer [54] for rectangular parity-check codes in our decoding algorithm. We treat the RPCC as a parallel concatenation of parity check codes, each defined by the parity check bits along one dimension and all of the data bits. Each component code employs the SISO decoding module suggested by Hagenauer [54]. There are several possible ways to exchange extrinsic information among the RPCC decoder and the decoders for the constituent codes of the turbo code. In all that follows,

13 we assume that extrinsic information from all of the other codes is used to decode a particular code. This form of decoding has been referred to as extended-serial (ES) decoding [55] and generally provides the best performance of the extrinsic-information exchange schemes. Each constituent decoder also uses intrinsic information from the channel measurement, which remains unchanged in the decoding process. Before the intrinsic and extrinsic information is operated on at each constituent decoder, the previous contribution to the extrinsic information from that decoder is subtracted from the total extrinsic information. The sum of the new extrinsic information and the extrinsic information from all of the other decoders is the output of each constituent decoder. For each systematic bit, the sum of the intrinsic information for that bit and the extrinsic information from all of the constituent decoders forms the soft output of the entire decoder. Based on the signs of the soft outputs, which are estimates of the a posteriori log-likelihood ratios, hard decisions are made. 2.3 Weight Spectrum-Based Performance Analysis As described by Divsalar [17] and Benedetto [18], the performance of concatenated coding systems can be approximated through the use of their input-output weight enumerating functions (IOWEFs). The IOWEF for the turbo code is calculated from the IOWEF for the recursive convolutional codes, which are in turn found by computer searches or transfer function techniques [18]. In this work, the standard union bounds are used to calculate the error probabilities. All of our analytical results are based on the uniform interleaver [18] and maximum-likelihood (ML) decoding. Thus, the analytical results are valid for systems which employs a random interleaver and for which the performance of the iterative decoder is comparable to that of the ML decoder. 2.4 Code Design Based on Weight Spectrum-Based Analysis The performance of concatenated coding systems can be approximated through the use of their input-output weight enumerating functions (IOWEFs), as described by Divsalar [17] and Benedetto [18]. The IOWEF for the recursive convolutional codes can

be found using computer searches or transfer function techniques, and the IOWEF for 14 the turbo code can be calculated from these as described by Benedetto [18]. The error probabilities can then be calculated using bounds. In this work, we use the standard union bounds. All of our analytical results are based on the uniform interleaver [18] and maximum-likelihood (ML) decoding. Thus, the analytical results are valid for systems for which random interleaving is used and for which the performance of the iterative decoder is comparable to that of the ML decoder. The low-order terms of the IOWEFs for RPCC can be found using combinatorics provided the dimension of the code is small. As in Benedetto [18], let w denote the number of bits input to the RPCC that have value 1. The IOWEF for the RPCC code is A O (W, H) = w,h A w,hw w H h, where A w,h denotes the number of codewords generated by an input sequence of Hamming weight w that have a total Hamming weight (both parity and systematic bits) of h. Define the conditional weight enumerating function (CWEF) of the outer RPCC code given input weight w as A O w = h A w,hh h. If w = 1, there will be two parity bits with value 1, and the remaining parity bits will have value 0, resulting in a total of three bits with value 1 at the output of the RPCC. We have calculated the enumerations for several of the low weight codewords for the RPCC. The terms that we include are usually sufficient for use in the union bound above the cutoff rate. We include some bounds on the weight enumerations for terms with higher weights. If a bound on the error probabilities is used that requires knowledge of many of the terms in the IOWEF, then Monte-Carlo simulation can be used to estimate the IOWEF. Let B denote the block size of the code (in information bits), and define the dimension of the RPCC code as D = B. The coefficients for w = 2 and w = 3 are then given by ( D A O 2 = 2D 2 ( D A O 3 = 4 2 ) H 4 + 2 ) 2 H 5 + [ 2D ( ) 2 D H 6, and 2 ( ) D + 3 ] D(D 1)D! H 7 + 3! (D 3)! ( ) 2 D H 9. 3

For w > 3, it becomes much more difficult to determine the CWEF. We use the following techniques to bound the performance of the overall code. For input weight w, the number of codewords with output weight 3w is ( D w) 2w!. If w is not divisible by four, then the minimum output weight is w + 2. Thus, for the purposes of our transfer function bound, we develop an upper bound to the error probabilities by supposing that for w > 4, w is not divisible by four, all of the codewords that do not have weight 3w have weight w + 2, which yields the following approximation: ( ) [ 2 (D ) D Aw O w!h 3w 2 + w w ( ) 2 D w!] H w+2, w > 4, w not divisible by 4. w For w = 4, we find that the number of codewords with output weight four is ( D 2) 2, and we use the following approximation to the CWEF, A O 4 = ( ) 2 D H 4 + 2 ( D 4 ) [ 2 (D ) 4!H 12 2 + 4 ( ) 2 D 4! 4 ( ) ] 2 D H 6. 2 For w > 4, w divisible by four, we develop an upper bound to the error probabilities by supposing the worst-case situation that all codewords that do no have output weight 3w have output weight w. Thus, the CWEF for these cases is approximated by ( ) [ 2 (D ) ( ) D Aw O w!h 3w 2 2 D + w!] H w, w > 4, w divisible by 4. w w w Assuming that a uniform interleaver is used between the RPCC and the turbo code, then the IOWEF of the overall serial concatenated code is given by Benedetto [56] A C w,h = B l=0 A O w,l AI l,h ), where A I w,h denotes the IOWEF of the inner (turbo) code, and B denotes the block length at the output of the outer (RPCC) code. The bit error probability is approximately P b ( ) wa C w,h B Q E b 2R C h. (2.1) N 0 w,h ( B l 15

Similarly, the probability of block error is approximately P B ( ) A C E b w,hq 2R C h. (2.2) N 0 w,h Equations (2.1) and (2.2) can be used to select a code based on its bit or block error performance, respectively. However, there are two limitations to the use of these bounds. First, these are union bounds, which diverge for SNRs less than the cutoff rate. Secondly, the performance in the low SNR tends to be dominated by limitations of iterative decoding more than the weight distribution of the code. In general, the union bounds only predict the performance in the error-floor region, in which the decoder performance is almost equivalent to the ML decoder. Thus, we can use (2.1) and (2.2) to select turbo codes for use by themselves or as inner codes in RPCC+turbo codes based on their error-floor performance. In what follows, we only consider only the block-error probability P B, which is given by (2.2), because for most types of information that would be turbo-encoded, no bit errors are acceptable in a packet. We first consider the performance of codes with different memory orders. Typical memory orders for turbo codes are 2, 3, or 4, which correspond to 4, 8, or 16 state constituent codes, respectively. For each memory order, we consider several codes of different designs that we believe would span the expected range of behavior of codes at that memory order. For all of the results below, we illustrate the performance for a block size of 2500 information bits. The performance trade-offs may be different for different block sizes. In this work, we use octal representations of generator polynomials in the form (feedforward polynomial, feedback polynomial). Thus, we refer to a turbo code that uses identical constituent codes with feedforward generator polynomial 1+D 2 and feedback generator polynomial 1 + D + D 2 by a turbo code uses (5,7) constituent code. For memory-2, we consider the typical (5,7) code, its inverse, the (7,5) code, and the (5,3) code. The (7,5) code has a nonprimitive feedback polynomial, 1 + D 2 = (1 + D) 2, and the (5,3) code is a big-numerator little denominator code [57]. We consider the use of these three codes by 16

17 10 0 10 2 (5,3)Turbo (7,5)Turbo (5,7)Turbo (5,3)RPCC+Turbo (7,5)RPCC+Turbo (5,7)RPCC+Turbo Probability of Block Error 10 4 10 6 10 8 10 10 10 12 1 0.5 0 0.5 1 1.5 E /N (db) b 0 Figure 2.2: Performance of typical memory-2 turbo codes and RPCC+turbo codes. themselves and as the inner code in an RPCC+turbo code. The block error probabilities for these codes are illustrated in Fig. 2.2. The results show that the (5,7) code is the best to use for either the turbo code or RPCC+turbo code. Note the huge performance difference predicted for RPCC+turbo codes over conventional turbo codes. For instance, at 1.6 db, the best turbo code achieves P B 3 10 3, while the RPCC+turbo codes all achieve P B < 10 6. This performance advantage increases with E b /N 0 for the RPCC+turbo codes that use the (5,7) and (7,5) constituent codes because these codes have a steeper error floor. For memory-3, we consider the (15,13) 3GPP turbo code, the (13,15) inverse of that code, and the (13,17) code. The (13,17) code uses a nonprimitive feedback polynomial 1 + D + D 2 + D 3 =(1 + D) 3. The block error probabilities for these codes are illustrated in Fig. 2.3. The (13,15) and (15,13) codes offer identical performance both in a turbo code and in a RPCC+turbo code. The (13,17) turbo code is over an order of magnitude worse in P B than turbo codes constructed from the other two codes. However, for RPCC+turbo codes, the (13,17) code starts off worse at lower SNRs, but has a much steeper slope to the error floor and thus a better asymptotic error performance.

18 10 0 10 2 (13,17)Turbo (13,15)Turbo (15,13)Turbo (13,17)RPCC+Turbo (13,15)RPCC+Turbo (15,13)RPCC+Turbo 10 4 Probability of Block Error 10 6 10 8 10 10 10 12 10 14 1 0 1 2 3 4 E b /N 0 (db) Figure 2.3: Performance of typical memory-3 turbo codes and RPCC+turbo codes. 10 0 10 2 (21,37)Turbo (37,22)Turbo (33,23)Turbo (21,37)RPCC+Turbo (37,22)RPCC+Turbo (33,23)RPCC+Turbo 10 4 Probability of Block Error 10 6 10 8 10 10 10 12 10 14 1 0 1 2 3 4 E b /N 0 (db) Figure 2.4: Performance of typical memory-4 turbo codes and RPCC+turbo codes.

19 For memory-4, we consider the (21,37) code used by Berrou and Glavieaux [1] in the original turbo code papers, the (33,23) code which has been found to be a good code by Divsalar and others [18, 24, 58, 59], and the (37,22) big-numerator little-denominator code [57]. The (37,22) code has nonprimitive feedback polynomial 1 + D 3 = (1 + D)(1 + D + D 2 ). The block error probabilies for these codes are illustrated in Fig. 2.4. The results show that over most values of E b /N 0 of interest, the codes based on the (33,23) code offer the best performance. It is interesting to note how poor the performance of the Berrou code is in comparison to the other codes of memory 4. For instance, the (21,37) Berrou turbo code gives P B that is approximately three orders of magnitude higher than the (33,23) turbo code. For very high values of E b /N 0, the (37,22) big-numerator little-denominator code will offer the best performance because of its steep error floor. Based on these results, if performance in the error-floor region is desired, we can select the appropriate code based on the desired memory order (and hence decoder complexity). The results in Fig. 2.5 illustrate the performance of the best codes for each memory order. For the turbo code, each increase in memory order translates into over an order of magnitude improvement in the error floor. However, the the RPCC+turbo codes, the situation is slightly different. At low E b /N 0, there is a huge gain from going to the (15,13) memorythree code from the (5,7) memory-two code. However, the gain from going to the (33,23) memory-four code is smaller. And interestingly, the (5,7) code has a much steeper error floor region, which will eventually result in it providing lower P B at high E b /N 0. Note that the worst of the RPCC+turbo codes still yields P B < 10 7 for E b /N 0 > 1.6 db, which is sufficient for most applications and beyond the region that we can simulate accurately. In most wireless applications, the code would be operated below 1.6 db in order to achieve higher energy efficiency. In this case, the performance will not necessarily be accurately predicted by these error bounds. As E b /N 0 falls below the cutoff rate, the performance becomes limited by the failure of the iterative turbo decoder to converge to a codeword. In the next section, we investigate the effects of decoder convergence.

20 10 0 10 2 (5,7)Turbo (5,7)RPCC+Turbo (15,13)Turbo (15,13)RPCC+Turbo (33,23)Turbo (33,23)RPCC+Turbo 10 4 Probability of Block Error 10 6 10 8 10 10 10 12 10 14 1 0 1 2 3 4 E b /N 0 (db) Figure 2.5: Comparison of performance for best turbo codes and RPCC+turbo codes of memory 2, 3, and 4. 2.5 Density Evolution-Based Performance Analysis In iterative soft-input soft-output (SISO) decoding, the limits of decoder convergence can be predicted by density evolution. In iterative decoding, the extrinsic information of the decoder improves as the iterations progress. Density evolution can measure this improvement quantitatively. For example, the structure of a turbo decoder that has two constituent convolutional decoders can be simplified as shown in Figure 2.6. Each turbo decoder iteration consists of one decoding by constituent decoder 1 and one decoding by constituent decoder 2. Each constituent decoding can be considered a half iteration [60]. The SNRs for the decoders at the half iterations are given by SNR1 in, SNR1 out, SNR2 in, SNR2 out. The convergence for the iterative decoding of two concatenated codes can be examined graphically by plotting SNR in versus SNR out curves for each constituent code. The axes for the second code are switched so that the output SNR for decoder 1 (SNR1 out ) and the output SNR for decoder 2 (SNR2 in ) are on the same axis. Since the output from constituent decoder 1 serves as the input to constituent decode 2 and vice versa, this approach

21 From Channel E b /N 0 Decoder 1 SNR1 out SNR1 in Decoder 2 SNR2 out SNR2 in Use actual density evolution or Gaussian Approximation to generate input Figure 2.6: Iterative turbo decoder. puts the output SNR for one half iteration on the same axis as the input SNR for the next half iteration. In this work, to remove the effects of finite interleaver size and the use of a particular interleaver, the output from one decoder is not used directly at the input of the next decoder. Instead, we either use the actual density evolution approach [60] or the Gaussian approximation approach to generate the input to the next decoder according to that density. In the actual density evolution approach, the probability density of the extrinsic information at the output of a decoder is experimentally determined and used to generate input. If we assume that the interleaver used in the scheme is very large and random, then the extrinsic information of the bits is approximately independent and identically distributed. Let L denote the extrinsic information for a particular bit. The probability density function f(l) of the extrinsic information is symmetric [20], i.e., L= log[f(l)/f( L)]. For turbo and turbo-like code, the probability density function f(l in ) and f(l out ) can be approximated by a Gaussian probability density function [61,62]. When the Gaussian approximation is applied to the density function, there are only two parameters, the mean µ and the variance σ 2, required to specify the density of the extrinsic information. We define the signal-to-noise ratio as

22 6 Gaussian approximation 5 Actual density 4 SNR1 out,snr2 in 3 4th iteration 5th iteration 2 tunnel 3rd iteration 2nd iteration 1 1st iteration 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 SNR1 in,snr2 out Figure 2.7: (5,7) turbo code density evolution. SNR = µ 2 / σ 2. (2.3) If f(l) is both Gaussian and symmetric, then variance can be expressed as σ 2 = 2µ, which means that the SNR can be specified in terms of one parameter as SNR = µ/2 [60]. However, the true density function, f(l), is not a perfect Gaussian. So, this expression for the SNR does not exactly match that given by (2.3). The results from both formulas show slightly different performance and the second formula (enforcing the symmetric condition with the Gaussian approximation in the definition of SNR) gives a slightly better prediction of decoder convergence [60]. For the actual density approach, the rejection method is used to generate the input extrinsic information according to the empirically determined density from the previous half iteration, and for the Gaussian approximation, the symmetric condition is applied to generate the input extrinsic information according to the mean µ from the previous half iteration.

23 The results in Figure 2.7 illustrate the SNR progress of the decoder s iterations for a turbo code that uses identical (5,7) constituent code. The results shown in Figure 2.7 are for E b /N 0 = 0.5dB. The SNR improvement at each half iteration follows a stair-step pattern that follows the tunnel between the two curves. If the two curves have a long narrow tunnel and a short distance between the two curves, then the decoder needs many iterations to escape from the tunnel. This means that the convergence rate of the iterative decoder is slow. A larger distance between the two curves and a short, wide tunnel means that the convergence of the decoder is fast. The starting point of the upper curve, which corresponds to SNR1 in = 0 changes with E b /N 0. For low E b /N 0, the starting point will have a smaller value. We can determine the iterative decoding threshold by decreasing E b /N 0 until the two curves touch. A comparison between two different turbo codes at E b /N 0 = 0.5dB is shown in Figure 2.8. The turbo code that uses the (15,13) constituent code has a wider tunnel than the turbo code that uses the (5,7) constituent code. From this observation, we can predict that the turbo code that uses (15,13) constituent code will have better convergence than the turbo code that is based on the (5,7) constituent code. 2.6 Code Design Based on Density Evolution-Based Analysis As shown by Divsalar and Dolinar [60], density evolution can be used to design codes based on convergence behavior. By choosing codes that have better decoder convergence, the waterfall region in the error probability curves can be moved to the left (lower E b /N 0 ). Thus, as long as the codes designed via density evolution do not have high error floors, the codes designed in this way can be used effectively at lower E b /N 0. In this section, we apply density evolution to two code design issues for the RPCC+turbo code. In Section 2.6.1, we investigate the effect on decoder convergence of choosing different constituent codes for the inner turbo code. In Section 2.6.2, we investigate the effects of using unequal energies for the systematic and parity symbols of the RPCC+turbo codes.