Error Performance Analysis of a Concatenated Coding Scheme with 64/256-QAM Trellis Coded Modulation for the North American Cable Modem Standard

Error Performance Analysis of a Concatenated Coding Scheme with 64/256-QAM Trellis Coded Modulation for the North American Cable Modem Standard Dojun Rhee and Robert H. Morelos-Zaragoza LSI Logic Corporation 55 McCarthy Blvd. MS G-820 Milpitas, CA 95035 August 27th, 998 Abstract In this paper, concatenated coding schemes with Reed-Solomon (RS) coding in the outer stage and trellis coded modulation (TCM) in the inner stage are analyzed. The TCM decoder incorporates a Viterbi decoder compliant with ITU-T Recommendation J.83, Annex B (North American digital video transmission specifications for Multimedia Cable Network System (MCNS)). The module is suitable as the inner stage of a concatenated decoding scheme. A triple-error-correcting extended RS (28,22,7) code over GF(2 7 ) is used in the outer stage. Union bounds on the bit error performance of concatenated coding schemes, with 64-QAM and 256-QAM signal constellations over AWGN channels are derived and verified by computer simulations. The effects of quantization of input symbols and of trace-back depth in the Viterbi decoder are simulated as well. Keywords: Concatenated Codes, Reed Solomon Codes, Trellis Coded Modulation, Viterbi Decoding. This paper was presented in part at the 998 International Symposium on Information Theory (ISIT 98), M.I.T., Boston, 998. - -

. Introduction The ITU-T Recommendation J.83, Annex B [], describes the framing structure, channel coding, and channel modulation of a digital multi-service television distribution system specified to a cable channel. The design of the modulation, interleaving and coding is based upon testing and characterization of the cable system in North America[-6]. The digital modulation format is 64-QAM and 256-QAM, with the QAM symbol rate and occupied bandwidth optimized for the 6 MHz NTSC channel plan employed in North America. The forward error correction (FEC) is based on a concatenated coding scheme that produces high coding gain with moderate complexity and overhead. The cable channel (including optical fiber) is primarily regarded as a bandwidth-limited linear channel, with a balanced combination of white noise, interference, and multi-path distortion. QAM signalling techniques and concatenated coding are well suited to this application and channel. Figure illustrates the basic concatenated FEC technique. The FEC section is composed of four processing layers. There are no dependencies on input data protocol in any of the FEC layers. FEC synchronization is fully internal and transparent. The processing layers are as follows: ) A (28,22,7) RS code provides block encoding and decoding to correct up to three symbols within an RS block. 2) Convolutional interleaver and de-interleaver evenly disperse the symbols, protecting against a long burst of symbol errors from being sent to the RS decoder. 3) Scrambler and de-scrambler randomize the data on the channel to allow effective QAM demodulator synchronization. 4) Trellis encoder and decoder provide convolutional encoding and with it the possibility of using soft decision trellis decoding of random channel errors. The paper is organized as follows: In section 2, the encoding and decoding of a TCM code over a 64-QAM constellation are discussed. Section 3 deals with TCM encoding and decoding over a 256-QAM constellation. In Section 4, the performance of the inner TCM code with respect to differential encoding, bits-to-signal mappings, finite quantization of received symbols and Viterbi decoding depth is presented. Also in Section 4, an error performance analysis of - 2 -

concatenated TCM schemes with 64/256-QAM constellation and an outer extended RS(28,22,7) code over GF(2 7 ) is presented. Approximated union bounds are derived and compared with computer simulations. The performance of a suboptimal double-error correcting RS decoder is analyzed and simulated as well. It is shown to achieve the specification for cable modems with reduced implementation complexity and decoding delay. Finally, in section 5 conclusions on this work are drawn. 2. 64-QAM TCM Encoding and Decoding A block diagram of a 64-QAM trellis coded modulator is shown in Figure 2 (a). A group of four RS symbols (RS, RS2, RS3, RS4) over GF(2 7 ) is encoded into five consecutive 64- QAM symbols. Therefore, the overall rate of the TCM code is 28/30. The symbol RS and RS2 are assigned to in-phase (or real) I symbols, and symbols RS3 and RS4 are assigned to quadrature (or imaginary) Q symbols. Of the 28 input bits that form a trellis group, each of two groups of 4 bits of the differentially precoded bit streams in a trellis group are separately encoded by a binary convolutional coder (BCC). Each BCC produces 5 coded bits, as shown in Figure 2 (a). The remaining bits are sent to the mapper uncoded. This produces an overall output of 30 bits. For the I bits, 4 MSB s of RS2 are input to the BCC, one bit at a time, LSB first. The remaining bits of RS2 and all 7 bits of the RS are input to the mapper, uncoded, LSB first one bit at a time. The four bits sent to the BCC will produce 5 coded bits. The same process is done for the Q bits. The QAM mapper receives the coded and uncoded 3-bit I and Q. It uses these bits to address a look-up table which produces the 3-bit X and Y as shown in Figure 5 (a). The 3-bit X and Y are then sent to the 64 QAM modem where the signal constellation in Figure 5 (b) is generated. 2. Binary Convolutional Encoder The trellis coded modulator includes a rate-4/5 punctured convolutional code based on a rate-/2 convolutional encoder that is used to introduce redundancy into the LSB s of the trellis - 3 -

group. The convolutional encoder is a 6-state non-systematic rate-/2 encoder with the generator: G = [25, 37] (octal). The outputs of the encoder are selected according to a puncturing matrix: [P,P2]=[000:] ( 0 denotes NO transmission, denotes transmission), which produces a single serial soft output stream for the rate-/2 Viterbi decoder. The de-puncturing matrix converts the rate-4/5 soft output stream into rate-/2 soft output stream. The internal structure of the punctured encoder is shown in Figure 2 (b). 2.2 Differential Precoder A differential precoder is used to perform a 90 o rotationally invariant trellis coding. The key for robust modem designs is to have very fast recovery. In non-rotationally invariant design, a certain timing slip will cause a major resynchronization of the FEC, leading to a burst of errors at the FEC output. A Differential encoder is shown in Figure 2 (c). The differential precoder allows the information to be carried by a change in phase, rather than by absolute phase. The LSB of the I and Q components are differentially encoded. As shown in Figure 5 (b), if we mask out the LSB of I and Q symbols, 90 o rotational invariance of the remaining uncoded 4 bits for 64-QAM or uncoded 6 bits for 256-QAM is inherent in the signal constellation. 2.3 64-QAM TCM Decoder The 64-QAM demodulator produces soft In-Phase and Quadrature outputs from the received 64-QAM signals through the cable network. Before feeding them into the rate-/2 Viterbi decoder, each In-Phase and Quadrature soft output will be depunctured with depuncturing matrix : [P, P2] = [000:] ( 0 denotes NO transmission, denotes transmission), which produces a single serial soft output stream for the rate-/2 Viterbi decoder. The structure of the decoder is shown in Figure 3 where m = 4. With reference to the TCM encoder, note that the LSB of I and Q bits are encoded with separate convolutional encoding and a QAM mapper maps 3-bit I and Q into 3-bit X and - 4 -

Y to represents an 8-PAM signal point. For example, if X is 000, then the In-Phase component in Figure 5 (b) is always and if Y is, then the Quadrature component in Figure 5 (b) is always -. Therefore, at the TCM decoder, the In-Phase and Quadrature soft outputs from the 64-QAM demodulator can be decoded independently. Also, at the rate-/2 Viterbi decoder, each In-Phase (or Quadrature) soft output is compared with reference 8-PAM signal constellation points {-7, -5, -3, -,, 3, 5, 7}. Since a rate-/2 6-state convolutional code is used in the TCM encoder, there are four distinct labels for each branch in the 6-state trellis, i.e., {00, 0, 0, }. Therefore, only the branch metrics for these four distinct branch labels need to be calculated. There are two-bits for each branch label, and therefore two consecutive soft outputs must be feed into the branch metric generator. These two bits are used as LSB for the 8-PAM signal constellation mapping. For each bit, there are four parallel branches with MSB label {00, 0,0,}. These four parallel branch labels are compared with four 8-PAM signal points which are decided by the LSB of the branch label.for example, suppose 0 is a label for a branch in a rate-/2 Viterbi decoder trellis. Then, for label 0, we calculate branch metrics with (000, 00, 00,0) = (, 5, -7,-3) in the 8-PAM signal constellation and find the label with minimum branch metric.for a label, we calculate branch metrics with (00, 0, 0,) = (3, 7, -5,-) in the 8-PAM signal constellation and find the label with minimum branch metric. After this, each branch metric is added, and pairs of two MSB bits are stored from the labels with minimum branch metric. After Viterbi Decoding, eight consecutive 2-bits pairs for parallel branches and 4 information bits from 0 consecutive In-Phase component of 64-QAM signals are recovered. Four recovered information bits are fed into the BCC to produce eight LSBs. These eight LSBs are combined with the eight consecutive 2-bits pair and fed into a puncturing device to generate five consecutive 3-bits X. Five consecutive 3-bits Y bits are recovered by the same procedure as for the quadrature part. Then 6-bit X and Y are converted into I and Q based on Table 2. These recovered five consecutive 2-bits MSB of I, five consecutive 2- bits MSB of Q, and pairs of four information bits from each BCC, totalling 28 bits, are - 5 -

mapped into four RS symbols over GF(2 7 ). The X and Y are also fed into a bit error counter for synchronization purposes. 3. 256-QAM TCM Encoding and Decoding A block diagram of a 256-QAM trellis coded modulator is shown in Figure 4. There are 38 information bits encoded into five consecutive 256-QAM symbols. Therefore, the overall rate of the TCM code is 38/40. To form a trellis group to be input to the encoder, the RS codewords are serialized beginning with the MSB of the first symbol of the first RS codewords following the frame sync. Bits are then placed in trellis group locations from RS symbols in the order: I 0, Q 0, I, Q, I 2,..., Q 7, I 8, Q 8 as shown in Figure 4. For the sync trellis group, RS bits begin at location I instead of I 0. Of the 38 input bits that form a trellis group, each of two groups of 4 bits of the differentially precoded bit streams in a trellis group are separately encoded by a binary punctured convolutional coder (BCC). Each BCC produces 5 coded bits. The remaining bits are sent to the mapper uncoded. For 256-QAM, the mapper receives the coded and uncoded 4-bit I and Q data from the trellis coded modulator. It uses these bits to address a look-up table which produces the 8-bit constellation symbol, i.e., X and Y. The look-up table for 256 QAM is shown in []. The 8-bit constellation symbol is then sent to 256-QAM modulator with the signal constellation arrangement shown in []. 3. 256-QAM TCM Decoding The 256-QAM demodulator produces soft In-Phase and Quadrature outputs from the received 256-QAM signals through the cable network. Before feeding it into the rate-/2 Viterbi decoder, each In-Phase and Quadrature soft output will be depunctured with de-puncturing matrix : [P, P2] = [000:] ( 0 denotes NO transmission, denotes transmission), which produces a single serial soft output stream for the rate-/2 Viterbi decoder. The decoder structure is shown in Figure 3. - 6 -

At the TCM encoder, LSB of I and Q bits are encoded with separate convolutional encoder and also the QAM mapper maps 4-bit I and Q into 4-bit X and Y which each represents a 6-PAM signal. For example, if X is 0000, the In-Phase component is always and if Y is, the Quadrature component is always -. Therefore, at the TCM decoder, the In-Phase and Quadrature soft outputs from 256-QAM demodulator can be decode independently. Also, at the rate-/2 Viterbi decoder, each In-Phase (or Quadrature) soft output is compared with reference 6-PAM signal constellation points {-5, -3,-,-9, -7, -5, -3,, 3, 5, 7, 9,, 3, 5}. The computation of branch metrics is similar to that for 64-QAM. Since a rate-/2 6-state convolutional code is used in the TCM encoder, there are four distinct labels for each branch in the 6-state trellis, i.e., {00, 0,0,}. Therefore, only the branch metrics for these eight distinct branch labels need to be calculated. Since there are two bits for each branch label, two consecutive soft outputs must be feed into BMGR. These two bits are used as LSB for representing 6-PAM signal constellation mapping. For each bit, there are eight parallel branches with MSB label {000, 00, 00, 0, 00, 0,0}. These eight parallel branch labels are compared with eight 6-PAM signals which are decided by the LSB of a branch label. For example, suppose 0 is a label for a branch in the rate-/2 Viterbi decoder trellis. Then, for label 0, we calculate branch metrics with (0000, 000, 000, 00, 000, 00, 00, 0) = (, 5, 9, 3, -5, -, -7,-3) in the 6- PAM signal constellation and find the label with minimum branch metric. For a label, we calculate branch metrics with (000, 00, 00, 0, 00, 0, 0, ) = (3, 7,, 3, -3, -9, -5,-) in the 6-PAM signal constellation and find the label with minimum branch metric. After that, we add each branch metric and store pairs of three MSB bits from stored labels with minimum branch metric. - 7 -

3.2 Recovery of Uncoded Information Bits After Viterbi Decoding, recovered are eight consecutive 3-bits pairs for parallel branches and 4 information bits from 0 consecutive In-Phase component of 256-QAM signals. Four recovered information bits are fed into the BCC to produce eight LSBs. These eight LSBs are combined with eight consecutive 3-bits pair and are then fed into a puncturing device to generate 5 consecutive 4-bits X. Five consecutive 4-bits Y bits are recovered by the same procedure as for the quadrature part. Then the 8-bits X and Y are converted into I and Q.. These recovered five consecutive 3-bits MSB of I, five consecutive 3-bits MSB of Q, and the pair of four information bits from each BCC, amount to the stimated 38 information bits. At the same time, X and Y are fed into a bit error counter for synchronization purposes. 3.3 Viterbi Bit Error Rate Monitor A performance monitor for the channel bit error rate is built in the decoder. Occurrences of bit errors are found by comparing an appropriately delayed version of the incoming channel data stream to the re-encoded TCM decoder output data stream. Depunctured data is delayed by the TCM decoder latency. The symbol stream can be compared on a symbol-by-symbol basis against the depunctured incoming channel stream. Any discrepancy between two respective symbols indicates a corrected error (or with a much smaller probability, a erroneous output bit produced by a failure of the Viterbi decoder to decode correctly). Decoder input symbols that are marked as erasures are disregarded when error events are detected. The 256-QAM TCM decoder is shown in Figure 3. 4. Performance Analysis 4. Inner TCM Schemes In this section, approximated union bounds and simulation results are presented on the error performance of the inner TCM schemes presented in previous sections. The effects of both a finite number of quantization levels and the decoding depth in the Viterbi decoder are - 8 -

discussed. 4.. An approximated union bound for TCM In the following analysis, it is assumed that the output of the convolutional interleaver produces sequences of symbols corrupted by samples of additive white Gaussian noise (AWGN). An equivalent truncated binary block code is derived, for each 6-state rate-4/5 punctured convolutional code (PCC). Each code operates a over one-dimensional signal set (8-PAM for 64-QAM TCM and 6-PAM for 256-QAM TCM). Thus a standard union bound can be derived on the probability of a bit error at the output of the TCM decoder. The generator matrix of this equivalent binary block code [7] is obtained from the impulse response (or, equivalently, the interleaved generator sequences) of the rate-/2 mother code, given by (,0,,0,), together with the puncturing pattern (,0,0,0). This pattern means deleting the 3rd, 5th and 7th columns (all zero columns) of the generator matrix of the block code, for every 8 output bits in correspondence to 4 information bits. This yields an equivalent (68,30) code whose generator matrix is listed in Table. The punctured positions correspond to all-zero columns and thus the effective length of the equivalent block code is 39. Each codeword in the equivalent block code corresponds to a path in the puncured code trellis, that starts and ends at the all-zero state, associated with 30 information bits. The weight distribution of the equivalent block code is shown in Table 2. For the other terms not shown, W() i = 0. Then an approximated union bound on the probability of a bit error at the output of the TCM decoder is obtained as follows [8] 39 39 i P --W()N i i 2E b Q ---------i 2 i b < -- n N i + n N 0 Q n i i = 3 i = 8E b ---------i 2 N 0 () where = 42 for 64-QAM, = 70 for 256-QAM and N denotes the effective number of nearest neighbors per signal point. The second term on the right hand side of () accounts for errors in the uncoded bits, assuming that the coded bits are correctly decoded. - 9 -

4..2 Effect of bits-to-signal mapping Figs. 6 show the bit error rate (BER) and approximated bounds after the X-Y mapping and the I-Q mapping over 64-QAM and 256-QAM signal constellations, respectively. With N = 2 in (), a tight upper bound is obtained for the probability of a bit error at the output of the Viterbi decoders (X-Y mapping). The simulated performance is very close to this bound, as expected, since the TCM Viterbi decoders operate over one-dimensional PAM signal sets. 4..3 Quantization Levels and Viterbi Decoding Depth In Fig. 7, the error performance of the inner TCM with 64-QAM is shown for various quantization levels. The results show that quantization to 8 bits is sufficient to achieve good performance. Also, Fig. 7 shows the error performance of the inner TCM with 64-QAM for various decoding depths, L, and quantization bits, M. Based on the simulation results, a decoding depth L=72, with M=8 bits per sample from the demodulator, is considered sufficient to achieve good performance. These values were used in all the computer simulation results presented in the next sections. 4..4 Differential Decoder With reference to Fig. 2 (c), it can be shown that the differential decoder operates on the pairs ( X, Y ) and ( X, Y ) to obtain an estimate on the transmitted pair ( W, j j j j j, Z j ) through equations (2) and (3) below, where the operations are over GF(2). W j = X j + Y j + X j + Y j (2) Z j = X j + Y j + ( X j + Y j )( X j + Y j ) (3) By analyzing all possible error patterns in a received pair ( R Xj, R Yj ) = ( X j, Y j ) + ( E Xj, E Yj ) with ( E, E ) { 0, } 2, it is possible to derive an upper bound on the average number of Xj Yj - 0 -

errors in the estimated data ( W j, Z j ). In the worst case, an error in the recieved pair ( R Xj, R Yj ) will cause both outputs ( W j, Z j ) to be in error. However, in many cases either only one output or no output will be in error. To obtain a tight bound on the error performance, for each of the 6 combinations of ( X, Y ) and j j ( X, Y ) pairs, the number of errors in ( W caused by errors (4 combinations) is quantified. The average number of errors in a symbol interval is bounded above j j j, Z j ) ( E Xj, E Yj ) by 4/3. Since errors affect two symbol intervals, the average number of errors in the estimated data, given an error in the received two-dimensional values ( R, R ), is at most 8/3. Xj Yj 4..5 Error performance after (I,Q) Mapping A second (I-Q) mapping is applied to ensure a 90 rotational invariance, as explained before. It is possible to show, following Ref. [9], that the average number of bit errors per symbol for the 64-QAM constellation is.27, while for the 256-QAM constellation it is.93. Therefore, the combination of TCM decoders, differential decoder and (X,Y)-to-(I,Q) mapping has a probability of a bit error P e < 3.3889 P b (4) for 64-QAM and P e < 3.806 P b (5) for 256-QAM, with P given by (). On the other hand, the simulation results show that the b degradation in performance due to the introduction of the I-Q mapping is of about 0.2 db, which corresponds to approximately two times the average bit error rate at the output of the TCM decoders. - -

4.2 Concatenated RS-TCM schemes In this section, an analysis of the performance of concatenated RS trellis coded 64-QAM and 256-QAM modulation schemes is presented. Again, it is assumed that symbols are corrupted by samples of AWGN. 4.2. Approximated Union Bound for Concatenated TCM and RS Decoders The probability of a bit error at the output of an RS (28,22,7) decoder has the following approximated upper bound [9] 28 64 j + 3 P b -------- 27 ----------- 28 j < Ps ( Ps ) 28 j (6) 28 j j = 4 P s = ( P e ) 7 (7) where P is the probability of a bit error in the inner stage, and given by (4) and (5), for 64- e QAM and 256-QAM, respectively. Approximated union bound (6) is plotted in Fig. 8 for concatenated RS-TCM 64-QAM and 256-QAM, respectively. In the normalization of the signal-tonoise ratio per bit, note that for the concatenated RS-TCM 64-QAM scheme, the rate is 5.3375 bits/symbol, while for the concatenated RS-TCM 256-QAM scheme, the rate is 6.24375 bits/symbol. E b N 0 Also shown in the figure are computer simulation results of these two concatenated RS TCM schemes. For both 64-QAM and 256-QAM, a good match between the approximated union bounds and the simulation results can be observed. 4.2.2 Using a Two-Error-Correcting RS Decoder In Fig. 8, the error performance of a (suboptimal) two-error correcting RS decoder is compared with that of a three-error decoder for the outer RS(28,22,7) code. This RS decoder still operates on received RS words of length 28, but it corrects up to two random errors only, ignoring the extended position (which carries no information). This has an impact on the burst - 2 -

error correcting capability due to the interleaver, as t = 2 instead of t = 3 random 7-bit byte errors are corrected. Let λ max denote the maximum length of a correctable burst error. Then, with the double-error correcting RS decoder, the maximum correctable burst error length is reduced to 2/3λ max With AWGN only, the bound used to estimate the probability of a bit error at the output of the double-error correcting RS decoder is [0] 28 64 j + 2 P b -------- 27 ----------- 28 j < Ps ( Ps ) 28 j (8) 28 j j = 3 with P s given by (7). For 64-QAM, the use of this two-error correcting decoder causes a loss of approximately 0.5 db with respect to the full three-error correcting decoder. In the case of 256-QAM, the loss is of about 0.4 db. The degradation in performance due to a double-error correcting decoder is relatively small and the cable modem standard specification [2] can be met: For 64-QAM, at a BER of 0 8 the required signal-to-noise ratio (SNR) is E N 23.5 db, or E N 6.2 db. On s 0 b 0 the other hand, in the case of 256-QAM, an SNR of E N 30 db, or E N 2.4 db, is s 0 b 0 needed. Based on the results presented in this section, with a three-error correcting RS decoder, the gain margin over the specification is at least.8 db for 64-QAM and 256-QAM. With a doubleerror correcting RS decoder, the gain margin is at least.3 db. The complexity, in number of GF operations and memory size, of a t -error correcting RS decoder is Ont ( 2 ). It follows that a double-error-correcting decoder is about 4/9 (44%) less complex than a triple-error-correcting decoder. We conclude that, in situations where the complexity of the RS decoder is an issue, the proposed decoder gives a good trade-off between random and burst error performance, coding gain margin and implementation complexity. - 3 -

5. Conclusions In this paper, an error performance analysis was presented of concatenated TCM schemes over 64-QAM and 256-QAM constellations, with an extended RS(28,22,7) code over GF(2 7 ) in the outer stage. Approximated union bounds were derived and verified by computer simulations. The effects of finite quantization of the demodulated symbols and decoding depth in the Viterbi decoder were also discussed. A simple double-error correcting RS decoder was shown to met the specification for cable modems, resulting in a good trade-off between error performance and implementation complexity. REFERENCES [] Digital Multi-Programme Systems For Television Sound and Data Services For Cable Distribution, ITU-T Recommendation J.83, Annex-B, 995. [2] Prodan, R. et al., Analysis of Cable System Digital Transmission Characteristics, NCTA Technical Papers, 994 [3] Prodan, R. et al., Cable System Transient Impairment Characterization, NCTA Technical Papers, 994. [4] Waltrich, J., Results of 64 QAM Field Tests over Cable and Alternate Media, IBC Conference Papers, 994. [5] Waltrich, J. et al., The Impact of Microreflections on Digital Transmission over Cable and Associated Media, NCTA Technical Papers, 992. [6] Waltrich, J.. Channel Characterization for Digital Transmission, Proceedings SCTE Conference on Emerging Technologies, 993. [7] S. Lin and D. Costello, Jr., Error Control Coding: Fundamentals and Applications, Prentice- Hall, 983. [8] M.P.C. Fossorier, S. Lin and D. Rhee, Bit Error Probability for Maximum Likelihood Decoding of Linear Block Codes, Proc. 996 International Symposium on Information Theory and Its Applications (ISITA 96), Victoria, Canada, pp.606-609, Sept. 996, and accepted for publication on IEEE Trans. Inform. Theory. [9] W.J. Weber, Differential Encoding for Multiple Amplitude and Phase Shift Keying Systems, IEEE Trans. Comm., vol. COM-26, no. 3, pp. 385-39, March 978. [0] G.C. Clark, Jr. and J.B. Cain, Error-Correction Coding for Digital Communications, Plenum Press, 98. - 4 -

Data RS Encoder Convolutional Interleaver Scrambler TCM Encoder Data RS Decoder Convolutional de-interleaver De-Scrambler TCM Decoder Figure. Concatenated Coded System. - 5 -

Q 9, Q 6, Q 3, Q 0, Q 3, Q 2, Q Q 0, Q 8, Q 7, Q 5, Q 4, Q 2, Q I 0, I 8, I 7, I 5, I 4, I 2, I I 9, I 6, I 3, I 0, I 3, I 2, I msb RS 4 RS 3 RS RS 2 I 3, I, I 8, I 5, I 2 X 3, X, X 8, X 5, X 2 I 2, I 0, I 7, I 4, I X 2, X 0, X 7, X 4, X RS RS 2 RS 3 RS 4 Parser I 9, I 6, I 3, I 0 Q 9, Q 6, Q 3, Q 0 Differential Encoder Q 3, Q, Q 8, Q 5, Q 2 Q 2, Q 0, Q 7, Q 4, Q I 4, R=/2 6-state I 3, I 2, I, I 0 BCC Punctured to R=4/5 Q 4, Q 3, Q 2, Q, Q 0 R=/2 6-state BCC Punctured to R=4/5 QAM Mapper X 4, X 3, X 2, X, X 0 Y 3, Y, Y 8, Y 5, Y 2 Y 2, Y 0, Y 7, Y 4, Y Y 4, Y 3, Y 2, Y, Y 0 QAM Constellation (a) 64-QAM TCM Encoder G = 25 000 I 9, I 6, I 3, I 0 Z - Z - Z - Z - G 2 = 37 (b) Punctured Convolutional Encoder W j Z j Differential Precoder X j Y j X j = W j + X j- + Z j (X j- + Y j- ) Y j = Z j + W j + Y j- + Z j (X j- + Y j- ) (c) Differential Precoder Figure 2. 64-QAM TCM Encoder. - 6 -

I_LSB DECODED_BIT_I MSB s 4-bits for 64-QAM Table (X,Y) Differential Decoder DECODED_BIT_Q 6-bits for 256-QAM Q_LSB (I,Q) m - 6 for 256-QAM and 4 for 64-QAM DECODED_BIT_I I_SOFT 8 8 DEMOD FIFO 8 8 Q_SOFT 8 DEPUNC E 8 E S2P S2P 8 8 8 8 R=/2 Viterbi Decoder R=/2 Viterbi. Decoder m MSB m MSB REENCODED BIT 2 ENCODER m DELAY P2S m DELAY ENCODER 2 REENCODED BIT m/2 m/2 X PUNC- -TURE Y DECODED_BIT_Q 8 bits 8 bits Q_SOFT I_SOFT I_HARD DELAY HARD DECISION 4 bits 4 bits DELAY 4 4 4 4 Q_HARD DECISION LOGIC Q I Q I COMPARATOR Data Valid Figure 3. 64/256-QAM TCM Decoder. - 7 -

Q 8, Q 7, Q 6, I 8, I 7, I 6, Q 5, Q 4, Q 3, I 5,I I 4, I 3, Q 2, I 2, Q, Q 0, I, I 0, I 9, Q 8, I 8, Q 7, Q 6, Q 5, I 7, I 6, I 5, Q 4, I 4, Q 3, I 3, I 2, I, Q 0, I 0 38-bits Non-Sync trellis group bit order Q 8, Q 7, Q 6, I 8, I 7, I 6, Q 5, Q 4, Q 3, I 5,I I 4, I 3, S 7, S 6, Q, Q 0, I, I 0, I 9, S 5, S 4, Q 7, Q 6, Q 5, I 7, I 6, I 5, S 3, S 2, Q 3, I 3, I 2, I, S, S 0 38-bits Sync trellis group bit order I 8, I 5, I, I 7, I 3 X 8, X 5, X, X 7, X 3 I 7, I 4, I 0, I 6, I 2 I 6, I 3, I 9, I 5, I X 7, X 4, X 0, X 6, X 2 38 bits Parser Q 8, Q 5, Q, Q 7, Q 3 Q 7, Q 4, Q 0, Q 6, Q 2 Q 6, Q 3, Q 9, Q 5, Q I 2, I 8, I 4, I 0 (S 6, S 4, S 2, S 0 ) Q 2, Q 8, Q 4, Q 0 (S 7, S 5, S 3, S ) Differential Encoder BCC BCC I 4, I 3, I 2, I, I 0 Q 4, Q 3, Q 2, Q, Q 0 256 QAM Mapper X 6, X 3, X 9, X 5, X X 4, X 3, X 2, X, X 0 Y 8, Y 5, Y, Y 7, Y 3 Y 7, Y 4, Y 0, Y 6, Y 2 Y 6, Y 3, Y 9, Y 5, Y Y 4, Y 3, Y 2, Y, Y 0 QAM Constellation Figure 4. 256-QAM trellis coded modulator block diagram.. - 8 -

Q XY 0 (0,0) C7 (0,00) B6 (00,0) C5 (0,0) B4 (0,) C5 (,00) B8 (00,) C3 (,0) B6 00 (00,0) D (00,00) A0 (00,00) D3 (0,00) A2 (000,0) D9 (00,000) A2 (000,00) D (0,000) A4 00 (00,) D5 (,0) C8 (00,0) D7 (0,0) C6 (000,) D3 (,) C6 (000,0) D5 (0,) C4 000 (000,00) A9 (0,000) B2 (00,00) A (0,00) B0 (000,000) A (,000) B4 (00,000) A3 (,00) B2 (00,00) B5 (00,0) A4 (00,0) B3 (0,0) A6 (0,00 B7 (00,00) A6 (0,0) B5 (0,00) A8 0 (0,00) C3 (0,0) D2 (00,00) C (0,00) D4 (0,0) C (00,0) D0 (00,0) C9 (00,00) D2 0 (000,0) A3 (0,) D6 (00,0) A5 (0,0) D8 (000,00) A5 (00,) D4 (00,00) A7 (00,0) D6 00 (00,000) B (,00) C4 (00,00) B9 (0,00) C2 (0,000) B3 (,0) C2 (0,00) B (0,0) C0 00 0 0 000 00 00 0 I LSB(Coded bit) LSB(Coded bit) Figure 5 (a). 64-QAM Mapping Table. - 9 -

Y Rotationally invariant Trellis Coding 90 degree Phase Invariant I Q Differentially Coded bits 0 (0,) B3 (,0) B4 (00,) B5 (0,0) B6 (00,0) A3 (0,) A4 (0,0) A5 (,) A6 7 00 (0.00) B9 (,000) B0 (00,00) B (0,000) B2 (00,000) A9 (0,00) A0 (0,000) A (,00) A2 5 00 (00,) B5 (,0) B6 (00,0) B7 (0,0) B8 (000,0) A5 (00,) A6 (00,0) A7 (0,) A8 3 000 (000,00) B (0,000) B2 (00,00) B3 (0,00) B4 (000,000) A (00,00) A2 (00,000) A3 (0,00) A4 (00,0) C3 (0,00) C4 (000,0) C5 (00,00) C6 (000,00) D3 (00,0) D4 (00,00) D5 (0,0) D6-0 (00,0) C9 (0,00) C0 (000,0) C (00,00) C2 (000,00) D9 (00,0) D0 (00,00) D (0,0) D2-3 0 (0,0) C5 (,00) C6 (00,0) C7 (0,00) C8 (00,00) D5 (0,0) D4 (0,00) D3 (,0) D4-5 00 (0,0) C (,00 C2 (00,0) C3 (0,00) C4 (00,00) D (0,0) D2 (0,00) D3 (,0) D4-7 00 0 0 000 00 00 0-7 -5-3 - 3 5 7 X Figure 5 (b).. 64-QAM Constellation Mapping Table. - 20 -

Table. Generator matrix of equivalent block code of a rate-4/5 PCC. 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000-2 -

i W(i) ------------------- 3 4.0 4 62.0 5 400.0 6 203.0 7 7973.0 8 30636.0 9 0756.0 0 332006.0 922226.0 2 2305588.0 3 525688.0 4 0690022.0 5 9889250.0 6 33729982.0 7 52254476.0 8 7405670.0 9 965550.0 20 455656.0 2 2576560.0 22 25564556.0 23 550480.0 24 97369340.0 25 7509300.0 26 52838082.0 27 33843030.0 28 9672540.0 29 0335752.0 30 4893594.0 3 208090.0 32 79068.0 33 267892.0 34 79746.0 35 20652.0 36 4586.0 37 86.0 38 33.0 39 7.0 Table 2. Weight distribution of equivalent block code for a rate-4/5 PCC. [ - 22 -

Figure 6: Error performance of inner TCM over 64-QAM and 256-QAM: Approximated union bound with two neighbors [TCM_64_2NN, TCM_256_2NN}, simulations after X-Y mapping [Sim_TCM64_XY, Sim_TCM256_XY], approximated union bound [TCM64_diff, TCM256_diff] and simulations [Sim_TCM64_IQ, Sim_TCM256_IQ] after I-Q mapping. - 23 -

Figure 7: Various decoding depths (L) and quantization bits (M) for the TCM Viterbi decoder over 64-QAM. - 24 -

Figure 8: (i) Theoretical 64-QAM [64-QAM] and 256-QAM [256-QAM]; (ii) approximated union bounds [RS_TCM64_diff, RS_TCM256_diff] and simulations [Sim_RS_TCM64, Sim_RS_TCM256] of concatenated RS-TCM, over 64-QAM and 256-QAM, using a triple-error correcting RS decoder; and (iii) approximated union bounds [RS_TCM64_T2_diff, RS_TCM256_T2_diff] and simulations [Sim_RS_TCM64_T2, Sim_RS_TCM256_T2] of concatenated RS-TCM, over 64-QAM and 256-QAM, using a double-error correcting RS decoder. - 25 -