of 64 rows by 32 columns), each bit of range i of the synchronization word is combined with the last bit of row i.

TURBO4 : A HCGE BT-RATE CHP FOR TUREO CODE ENCODNG AND DECODNG Michel J.Mquel*, Pierre P&nard** 1. Abstract Thrs paper deals with an experimental C developed for encoding and decoding turbo codes. The chip includes an encoder and a decoding module which pcrforms one iteration of the decoding process. All rhe necessary interleaving memories and delay-lines are included in the circuit. The encoder is made up of a parallel concatenation of 2 convolutional encoders (constraint lengrh = 5) separated by an interleaver (64 x 32 matrix). The decoder uses the SOVA technique and a dedicated module achieves the synchronization task as well as a supervision function. Very high level performances can be achicved in 5 iterations : with a QPSK modulation, a BER of 258 is obtained with EbNo = 2 db. The turbo 4 chip can work in continuous mode up to 54 Mbitsk useful data throughput and is well suited for dara flow applications snch as vidco broadcasting. The C is designed in a 0.25 pm CMOS techoiogy and its core size is less than 8 mm'. 2. ntroduction Turbo codes are a new family of error correcting codes introduced by C. Berrou and d [1,2J. The coding operation is based on a parallel concatenation of recursive and systematic convolutiond codes. The decoding process is iterative and the performance of a decoder is a function of the number of iterations. A codec based on this principle provides performmces which are very close to the theoretical channel limit. 3. Chip description The circuit may be used either as an encoder or as a decoding module and except for the U 0 pads, the encoder does not share any hardware with the decoder. n order to reach a high bit rate throughput, the hardware of one chip performs only one iteration of the decoding process. The chip has been designed and packaged in order to facilitate the cascading of several chips according to the required number of iterations. 3.1 The encoder (Figure 1) Two dentical Recursive Systematic Coders with a 4-bit length memory are used to build the encoder, Their polynomials are 23,35. The incoming data is fed to a first encoder which produces redundancy Yl while the second encoder receives interleaved data and produces redundancy Y2. The overall basic rate is 113 but it can be increased by puncturing the output sequence. f a 112 codmg rate is targeted, a built-in puncturing function provides the composite redundancy Y folowing the sequence : Y2 Yl Y1 Y1. One of the main issues of a turbo code decoder is the synchronization of its interleaveddeintedeavers and, for this purpose, a 64-bit word may be inserted on X or Y1 as follows : Representing the data in the way they are writen in the interleaving memory (a ma~x of 64 rows by 32 columns), each bit of range i of the synchronization word is combined with the last bit of row i. 3.2 The decoder (Figure 2) The decoder is made up of 2 Soft Output Viterbi Algorithm decoders (SOVAl and SOVA2), interleaving and de-interleaving modules, the necessary delay lines, and a synchronization block * ENST de retape, Technop8le Brest roise, BP 832, 29285, Brest Cedex France. ** CCET, 4 rue du dos Cornel, BP $9,35512 CESON-SEVGNE Cedex Fmnce. Email : michel.jez~uel@enst-bremgne.fr Emait : F ~ ~ ~. p ~ n ~ d ~ C n e ~. ~ ~ ~ l ~ ~ r n. ~ 4/1 @ 1999 The nmion of Electrical Engineers. Printed and published by the EE, Savoy Place, Lundan WG2R OBL, 1.

(SYNCHSUP) which also features supervision functions. Many programming capabilities have been implemented to adjust the results of the decoding process particularly in cascading applications. ncoming data X, Y1, Y2, Z are coded on 4 bits in 2 s complement mode; Z is the extrinsic information. The circuit provides the following outputs : properly delayed and ordered : XO, Y 10, Y20 and computed 20. hard decisions of the SOVAs : XlQ, X20. quality measurement of the decoding operation OR 3 bits (Q). synchronization information. The data flow aong the chip is the following : as a first step, SOVAl works on redundancy Y1 with the noisy data X added to 2. Switching and inverting/delaying capabilities have been inserted before the SOVAl in order to cope with the various ambiguities caused by the different modulations used to broadcast the signal. This first stage is automatically positioned by the synchronisation module. Then the SOVA2 processes Y2 and the interleaved output of SOVAl from which incoming Z has been subtracted. Finally, the X input of SOVA2, is subtracted from its output and, after de-interleaving, the extrinsic information 20 may be used by the subsequent module as input 2. 33 The SOVA block (Figure 3) The proposed architecture in order to build a Viterbi decoder providing weighted decisions is based on the SOVA principle (Soft Output Viterbi Algorithm proposed by J. Hagenauer [3]), t has been chosen for its low complexity and its area saving properties for an implementation on silicon [4]. Furthermore, it can work at a high clock rate. The a posteriori weighting algorithm is executed in 2 steps : first, a regular Viterbi decoder provides the maximum likelihood path; the survivors are updated in a 33-stage Register Exchange trellis. in a second trellis, 2 paths are retrieved : one is the following of the likeliest path back 25 more stages; the other one, reaching the same state is the likeliest discarded path ( concurrent path ). The weighting mechanism : From the properly delayed input symbols, transition metrics and path rnetrics are computed a second time. n order to get the weight W,(k) of a node m, its 2 accumulated metrics are substracted and the absdute value of the result is applied to the input of the revision register. Under the control of the revision logic, values are shifted in the register in the following way : (k is the time, L and L are respectively the length of the first and of the second Register Exchange). At a level j, Wj(k) being the content of the register, if the binary decision yielded by the maximum likelihood path (sj(kk)) and the one yielded by the concurreny path (s j(k)) are different, and if Wj(k) > W,(k) then Wj(k) is replaced by W,(k) and this value is shifted in the register. The final weighted decision of the decoder is given by sl+lfk) for the sign and by WbL,(k) for the absolute value. n order to reduce weighting error effects especially when several modules are cascaded, a clamping function is applied to the last 4 stages of the revision register. 3.4 nterleaving and de-interleaving These dud stages are necessary to present the noisy data in the right order at the input of each SQVA. The interleaver s memory size is 2K words and the addressing sequence has the following properties ; non-uniform interleaving, regular patterns are rejected and as the memory is split into 2 pages of 1K words, writdread operations are alternatively made at different addresses in different pages. 412

3.5 Synchron~ation/supervision To get the circuit to work properly, interleaving and &-interleaving blocks must be synchronized. Synchronisation information can be externally provided to the chip on pin SYNCN or automatically recovered by the circuit, Princide : starting from an arbitrary position, each bit of the supposed synchronization word is punctured at the input of the SOVAl; at the output of this block, the recovered bits are analyzed through a correlator comparing the received word to the reference one. f they are different, the searching position is shifted by one bit and a new supposed word is collected again. When enough bits are recognized, the circuit: is declared "synchronized, interleaving and de-interleaving address generators are initidized and the supervision function is set. The supervision function (ie : tracking function) is completely different from the synchronization one and does not refer to the synchronization word. The so-called pseudo-syndrome technique [4] consists in analyzing each couple of data (X,Y) at the input of the $OVA2 and evaluating whether it belongs to the code sequence or not. Counting the wrong couples of data (the syndromes), the quality of the decoded signal is evaluated; if it is below a given threshold, the Loss Of Synch signal is activated and a new synchronization process is started. 4. Performances This section presents results of tests performed on the circuit. Figure 4 shows the results for a Gaussian channel. The global coding rate is 112 and up to 5 iterations are performed. n figure 5 we dso have a Gaussian channel and the coding rate is 2/3. The dotted curve represents the 1/2 rate in 5 iterations. n these 2 figures a A attening of the curves can be noticed for the low bit rates. This degradation is due to the data-path width inside the chrp which uses only 4 bits. 5. Chip characteristics This circuit performs one iteration of the decoding process and the required number o iterations is simply done by cascachng the right number of chips. t contains about 600.000 transistors including 46 kbit of static memory, and the die size (including the pads) is ess than 9 mmz in 0.25 pm, 5 metal CMOS technology (Figure 6); it does not include any dynamic device and can work from a very low frequency domain up to 54 MHz. 6. Conclusion The very efficient coding gain of the turbo codes was verified on the turbo4 chip. Present work is ongoing in 2 domains : the integration of 4 turbo4 modules in the same circuit and the design of a multi-purpose chip performing the turbo code decoding algorithm in black, mode with various sizes, and with an adjustable number of iterations. 7. References [l] C. Berrou, A. Glavieux and P. Thitimajshima, "Near Shannon limit error-correcting coding and decoding: turbo codes", CCP3, Geneva. [Z] C. Berrou and A. Glavieux, "Near optimum error correcting coding and decoding : turbo codes", DXE Transactions on communications, va1.44, nrl0, pp 1261-1271. [31 5. Hagenauer ad P. Hoeher, "A Viterbi algorithm with soft-decisions outputs and its applications", Proc. leee GlbccomB9, Dallas, Texas, nov. 89, pp 47.1.1-47.1.7. [4] C. Berrou, P. Adde, E. Angui,, and S. Faudeil, "A low complexity soft output Viterbi decoder architechhe", CC93, Geneva, May 1993. [5] C. Berrou and C. Douillard, "Pseudo-syndrom method for supervising Viterbi decoders at any coding rate", Electron. Letters, vok. 30, no 13 pp 1036-1037, lune 1994.

X - 1 Figure 1 : the encoder X z- - + 1 nput symbols 74meha Add J,X20 + SOVA 1 + SOVA2 w - Y Transition Compare Select Deay f + ncer- -* * + leaver - /+ 1 4 Delay Figure 2 : the decoder U Delay Line (L) Subtract De-interleaver TREJLS ; rill'-------- trace-back logic (x2) 1 20 Sign + Figure 3 : the SOVA block 414

B it En0........ jl............ j...... 0 2 3Eb/No (db 1 d, 2.- #S... r,...!...,.., #5L 5...!.<.... 0 2... \... : Figure 4 : Rate = 1/2 Figure 6 : microphotograph of the chip 415