A 13.3-Mb/s 0.35-m CMOS Analog Turbo Decoder IC With a Configurable Interleaver

2010 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER 2003 A 13.3-Mb/s 0.35-m CMOS Analog Turbo Decoder IC With a Configurable Interleaver Vincent C. Gaudet, Member, IEEE, and P. Glenn Gulak, Senior Member, IEEE Abstract Circuits and an IC implementation of a four-state, block length 16, three-metal one-poly 0.35- m CMOS analog turbo decoder with a fully programmable interleaver are presented. The IC was tested at 13.3 Mb/s, has a 1.2 s latency, and consumes 185 mw on a single 3.3-V power supply, resulting in an energy consumption of 13.9 nj per decoded bit, thus reducing the energy consumption by 70% relative to existing digital turbo decoders. The core area is 1131.2 1257.9 m 2. The addition of swinging buffers could triple the speed and reduce the latency with minimal increase in power consumption by overlapping storage and decoding phases. Mismatch simulations show that the circuits will be viable for decoder lengths up to a few hundred information bits. Index Terms Analog decoding, iterative decoding, turbo codes. Fig. 1. System block diagrams of two alternative receiver structures. (a) Traditional, with analog-to-digital converter and digital decoder. (b) Proposed, with analog decoder. I. INTRODUCTION DUE TO their impressive error-correcting capabilities at low signal-to-noise ratios (SNRs), turbo codes [1] and low-density parity-check codes [2] are being introduced into several new data communications standards [3] [5]. However, because of the iterative nature and data dependencies of the iterative decoding algorithm, digital turbo decoders [6] [8] potentially suffer from long latencies and high power consumption. In order to circumvent these problems, the use of analog circuit techniques to implement iterative decoding was proposed in [9] and [10]. Analog decoders represent internal recursion metrics and soft values as voltages or as currents. Analog soft outputs are fed from one decoder to the other in a continuous-time analog feedback loop, allowing the outputs to settle on a codeword. Since the analog circuits that perform the required decoding functions are so small, entire factor graphs or trellises can be constructed in parallel in silicon, thus speeding up the decoding process. Circuit details and an IC implementation of a BiCMOS analog two-state 8-bit tailbiting decoder were recently published in [11], an analog BiCMOS tailbiting decoder with fixed interleaver was presented in [12], and a subthreshold CMOS Hamming decoder was presented in [13]. These decoders are all based on a Gilbert multiplier, which exploits the exponential voltage current relationship of transistors. Manuscript received April 11, 2003; revised June 30, 2003. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). V. C. Gaudet is with the Department of Electrical and Computer Engeneering, University of Alberta, Edmonton, AB T6G 2V4, Canada (e-mail: vgaudet@ece.ualberta.ca). P. G. Gulak is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada. Digital Object Identifier 10.1109/JSSC.2003.818134 In this paper, we present circuits and an implementation of a CMOS all-analog turbo decoder, including two constituent MAP decoders and a programmable interleaver/deinterleaver. The paper is organized as follows. Section II introduces the concept of analog decoding. Section III discusses analog decoder circuits and design issues. Section IV presents IC test results for a three-metal one-poly (3M1P) 0.35- m CMOS analog implementation of a turbo decoder. Finally, Section V concludes this paper. II. ANALOG TURBO DECODING A. Receiver Structures Consider the block diagram of a traditional receiver depicted in Fig. 1(a). A demodulator feeds its analog outputs into an analog-to-digital converter (ADC), which produces fixed-point digital soft inputs that are then processed using a digital decoder. The fixed-point precision required for the soft inputs to the digital decoder is usually in the 5 8 bit range, in order to achieve near-optimal results. The ADC must be able to produce these 5 8 bit values at the rate at which they arrive from the channel, which could account for several hundred million samples per second in some applications, especially if oversampling is used or if the code rate is low. The digital decoder may produce soft outputs, though hard decisions are easily obtained by retaining only the sign bits. Now consider Fig. 1(b), which depicts a slightly modified receiver in which the ADCs are removed, leaving only sample-and-hold (S/H) circuits to hold the decoder inputs. The analog outputs from the S/H circuits are then used by an analog decoder, which performs its processing in the analog domain using continuous-valued current and voltage metrics. A bank of comparators converts the analog decoder s soft outputs into a sequence of hard decisions. 0018-9200/03$17.00 2003 IEEE

GAUDET AND GULAK: CMOS ANALOG TURBO DECODER IC WITH CONFIGURABLE INTERLEAVER 2011 (a) Fig. 3. (a) MAX circuit. (b) Transconductor. (b) Fig. 2. Turbo decoder system block diagram. The feasibility of an analog decoder is made possible by the robustness and relatively low precision requirements of iterative decoding algorithms, and by the existence of very simple and compact circuits, described below, which perform the required soft calculations. One might surmise that a receiver based on an analog decoder would benefit from the following advantages. Lower Power Consumption: In a digital decoder, one wire for every bit of precision sees rail-to-rail voltage swings several times during the decoding process, whereas in an analog decoder only one wire is used in its place and its voltage generally does not swing the whole supply range. In addition, highspeed ADCs can account for a significant portion of a receiver s power budget. For example, the 6-bit 1.3-Gsample/s 0.35- m CMOS ADC published in [14] consumes 545 mw on a 3.3-V power supply. Such converters are not required in analog decoder-based receivers. High Speed: In a digital receiver, throughput is limited either by the maximum speed of the ADCs or by the critical path of the decoder. The critical path of the decoder can only be improved to a very limited extent through parallelization and pipelining. Decoder latency grows linearly with frame size, number of decoding iterations, and clock period. On the other hand, an analog decoder has no ADC-imposed speed limitation, and the small circuit size allows a massive parallelization of the decoding process. In addition, analog decoders typically perform iterations in continuous time, eliminating the high-speed clocking circuitry. Smaller Area: Depending on the degree of parallelism used in a digital implementation of a decoder and the frame length of the code, an analog decoder can be several times smaller in terms of silicon requirements per decoded bit, thus lowering the production cost. B. Metric Representation For the sake of brevity, the details of the turbo decoding algorithm are not provided. Instead, the reader is referred to the summary given in [15]. In an analog decoder, soft information values, such as branch metrics, state metrics, and log-likelihood ratios, are represented as currents or voltages having a continuous-valued amplitude range. The conversion from state or branch metric to physical circuit quantity is done by scaling and shifting values by constant factors. For example, (1) relates a state metric to a state metric voltage, using a scaling factor and a shifting factor, and (2) relates a branch metric to a branch metric current, using a scaling factor and shifting factor. Note that and have voltage units and and have current units. III. ANALOG DECODER CIRCUITS AND DESIGN ISSUES (1) (2) A. Circuits A system block diagram of a turbo decoder is shown in Fig. 2, showing the interconnections between two MAP decoders, an interleaver ( ), and a deinterleaver ( ). Each analog log-domain MAP decoder is constructed using the two circuit building blocks depicted in Fig. 3. The first, in Fig. 3(a), realizes the state metric function using transistors M3 and M4 which operate in weak inversion; in moderate inversion, the function is slightly distorted incurring a penalty of around 0.1 db, but with better device matching properties. Branch metrics are generated using differential pair transconductors, depicted in Fig. 3(b), which use differential channel input voltages and produce differential output currents which are summed onto the pull-up transistors M1 and M2 which operate deep in the triode region. The differential pairs from Fig. 3(b) are also used to propagate state metric outputs from the circuits to subsequent trellis stages. Fig. 4 shows a trellis stage for one direction of recursion for a (7,5) recursive systematic convolutional (RSC) code, constructed using the and differential pair circuits. Note that and are connected to the minus inputs on the state metric propagators, ensuring state metric normalization since state metrics are referenced to known state metric values; this is analogous to subtracting the value of in a digital MAP decoder. An analog interleaver spatially permutes soft outputs from one MAP decoder for input to the other MAP decoder. A configurable interleaver, where a desired permutation is programmed at powerup to accommodate different standards, can be built using a network of crossbars [16]. Fig. 5 illustrates such an interleaver of size 16 constructed with twelve four-input crossbars, each programmed using a serial shift register which controls pass-transistor switches. Two differential signals are routed

2012 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER 2003 Fig. 6. Demultiplexer used to parallelize a serial input data stream V into locations V. Fig. 4. (a) Trellis stage and (b) corresponding circuit for a four-state trellis corresponding to the (7,5) RSC code. An input demultiplexer sample-and-hold unit stores a serial input stream of noisy channel values represented as voltages onto a bank of capacitors, as illustrated in Fig. 6. Each capacitor is directly connected to a branch metric transconductor. Addressing based on a Gray code was used since only one address line changes from one time interval to the next. Otherwise, if two or more address lines changed and the address lines were skewed relative to one another, then the multiplexer would temporarily access an undesired storage location, perturbing that location s stored voltage. Decoding is performed in three steps. First, an entire codeword is stored using the demultiplexer, requiring time. Then, the turbo decoder feedback loop is opened for a short period of time to reset the second MAP decoder s soft information to zero. This is accomplished by shorting the differential extrinsic information voltages using pass transistors. Then, continuous-time iterative decoding is performed by closing the feedback loop for time and allowing the network to settle to its final output values. Hard decisions are made on soft output values using a bank of output comparators. Fig. 5. Size 16 programmable interleaver. B. Mismatch Large analog circuits are potentially susceptible to mismatch effects since a large number of devices cannot effectively be matched on a large scale. In this section, we present simulation results of the effects of mismatch on the error-correcting capabilities of analog iterative decoders of varying size and complexity. For simulation purposes, we define mismatch as an undesired variation in input metric voltage such that (3) for every bit position, thus combining the interleaver and deinterleaver into one unit. The interleaver does not present a performance barrier for current turbo decoding applications. Since the pass transistors implement a passive routing network, the power consumed by the interleaver is minimal. where is a zero-mean Gaussian random variable with variance, generated once for each node at the beginning of a simulation. Two scenarios have been simulated. The first is a case where only a single value of is generated for each trellis section, representing a case where local matching is good, but

GAUDET AND GULAK: CMOS ANALOG TURBO DECODER IC WITH CONFIGURABLE INTERLEAVER 2013 Fig. 7. Degradation in BER performance due to mismatch. global matching (i.e., over large distances) is poor. The second is a case where a new value of is generated for every single input in the circuit. For the first scenario (good local matching, poor global matching), the loss with respect to a decoder with perfect matching ( ) was observed to be minimal for all simulated values in the range of. For the second scenario (poor local and global matching), a flattening effect was observed, and is illustrated in Fig. 7, which plots the bit-error rate (BER) at an SNR value of 20 db over a range of values, for codes of length 20, 100, and 256. The degradation in performance seems to be greater for longer codes, though the circuits should be useful for codes with up to a few hundred information bits. Fig. 8. Block diagram of analog turbo decoder. IV. IC IMPLEMENTATION AND TEST RESULTS A block diagram of an analog turbo decoder for a rate 1/3 code with four-state trellises, a block length of 16 information bits, and a configurable interleaver/deinterleaver is shown in Fig. 8. Trellises are left unterminated with minimal effect; the final forward state metrics were used as the starting point for the backward recursion. A die microphotograph of a 0.35- m CMOS implementation of the decoder is shown in Fig. 9. The pad-limited IC occupies 2691 2681 m, whereas the core occupies 1131 1258 m. Each MAP decoder occupies 1131 483 m and the interleaver/deinterleaver occupies 525 265 m. Testing was performed using an HP75000 tester, which stimulates a bank of six digital-to-analog converters (DACs) connected to the IC s inputs using precomputed noisy channel values. Test vectors were generated in software, with a precomputed noise component calibrated to specific SNRs. Test chip outputs were recorded and compared to noiseless reference values in software in order to generate measured BER curves. The IC was tested at a bit rate of 13.3 Mb/s, with ns (RST high), ns (RST high), and ns (RST low), and 16-bit codewords. The decoding latency was 1.2 s. With a 3.3-V supply, the decoder has a power consumption of 185 mw, resulting in an energy consumption per decoded bit of 13.9 nj. Fig. 10 shows an oscilloscope capture of the hard output of bit position 15, and of the reset signal, which opens the turbo feedback loop before each codeword is decoded. Fig. 9. Die microphotograph and floorplan. Fig. 11 presents the BER curves for the measured output of a single MAP decoder and of the turbo decoder, contrasted with a simulated floating-point digital decoder. The 1.1-dB loss from ideal turbo decoding is largely a result of MAP2 extrinsic information differential voltages not being properly shorted at the beginning of the decoding phase, leading to old soft values being injected into the feedback loop. This also explains why the MAP decoder test results are better than the simulated results:

2014 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER 2003 Fig. 10. Measured bit 15 signal (top) and reset signal (bottom). Fig. 12. Measured energy consumption of turbo decoder implementations. TABLE I PERFORMANCE SPECIFICATIONS OF THE ANALOG TURBO DECODER IC Fig. 11. Measured BER for analog turbo decoder. when RST is high, MAP2 injects unwanted extrinsic information into MAP1 s inputs, allowing MAP1 to perform better than expected. Performance degradation was observed when the outputs were sampled earlier in the decoding phase, not allowing output voltages to settle to their final values. Speed could be increased by using swinging buffers, which would allow overlapping of the phase with the and phases. An incoming codeword could then be stored at the same time as another codeword was decoded. Fig. 12 shows a comparison between the energy consumption per decoded bit of the analog implementation versus the consumption of recent digital turbo decoders, normalized for the number of trellis states. A decrease in energy consumption of approximately 70% has been achieved, notwithstanding the ADCs that would be required for digital implementations. Table I summarizes the performance specifications for the analog turbo decoder IC. The programmable analog interleaver was verified by testing BERs using several different reference permutations. Decoding speed could be increased to 40 Mb/s at a similar power consumption by expanding the input demultiplexer to include a swinging buffer, thus reducing the energy per decoded bit by 2/3. V. CONCLUSION In a digital decoder, latency increases with block size since the last state metric in a block is technically dependent on the first, and hence, calculations must be performed on the entire block before moving on to a subsequent iteration. In an analog decoder, even though the dependencies between values are the same as for a digital decoder, they get weaker and weaker with increasing block length. Thus, the effect of a voltage change at one end of the trellis on a metric at the other end goes to zero with increasing block length, and the effect eventually gets lost in noise after a number of trellis stages. Therefore, it is expected that with this technique, decoding latency will remain relatively constant with increased block size, resulting in a near-linear increase in throughput versus block length and relatively constant energy per decoded bit. We expect future analog decoder research to extend practical block sizes up to a few hundred information bits. ACKNOWLEDGMENT The authors wish to acknowledge fabrication support from the Canadian Microelectronics Corporation.

GAUDET AND GULAK: CMOS ANALOG TURBO DECODER IC WITH CONFIGURABLE INTERLEAVER 2015 REFERENCES [1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbo codes, in Proc. Int. Conf. Communications, Geneva, Switzerland, May 1993, pp. 1064 1070. [2] R. G. Gallager, Low-density parity-check codes, IRE Trans. Inform. Theory, vol. IT-8, pp. 21 28, Jan. 1962. [3], (2000) Third Generation Partnership Project (3GPP) Doc. 25.212. [Online]. Available: http://www.3gpp.org/ftp/specs/2000 12/ R1999/25_series/25212 350.zip [4], Interaction channel for satellite distribution systems, ETSI, Doc. EN 301 790, vol. 1.2.2, 2000. [5], Interaction channel for digital terrestrial television, ETSI, Doc. EN 301 958, vol. 1.1.1, 2001. [6] C. Berrou, P. Combelles, P. Pénard, and B. Talibart, An IC for turbocodes encoding and decoding, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 1995, pp. 90 91. [7] M. Bekooij, J. Dielissen, F. Harmsze, S. Sawitzki, J. Huisken, A. van der Werf, and J. van Meerbergen, Power-efficient application-specific VLIW processor for turbo decoding, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp. 180 181. [8] M. Bickerstaff, D. Garrett, T. Prokop, C. Thomas, B. Widdup, G. Zhou, C. Nicol, and R.-H. Yan, A unified Turbo/Viterbi channel decoder for 3GPP mobile wireless in 0.18 m CMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2002, pp. 90 91. [9] J. Hagenauer and M. Winklhofer, The analog decoder, in IEEE Int. Symp. Information Theory, 1998, p. 145. [10] H.-A. Loeliger, M. Helfenstein, F. Lustenberger, and F. Tarkoy, Decoding in analog VLSI, in Proc. IEEE Int. Symp. Information Theory, 1998, p. 146. [11] M. Moerz, T. Gabara, R. Yan, and J. Hagenauer, An analog 0.25 m BiCMOS tailbiting MAP decoder, IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 356 357, Feb. 2000. [12] F. Lustenberger, M. Helfenstein, H.-A. Loeliger, F. Tarkoy, and G. S. Moschytz, All-analog decoder for a binary (18,9,5) tail-biting trellis code, in Proc. ESSCIRC, Duisberg, Germany, Sept. 1999, pp. 362 365. [13] C. Winstead, J. Die, S. Yu, R. Harrison, C. J. Myers, and C. Schlegel, Analog MAP decoder for (8,4) hamming code in subthreshold CMOS, in Proc. IEEE Int. Symp. Information Theory, June 2001, p. 330. [14] M. Choi and A. A. Abidi, A 6b 1.3 GSample/s A/D converter in 0.35-m CMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp. 126 127. [15] J. Hagenauer, E. Offer, and L. Papke, Iterative decoding of binary block and convolutional codes, IEEE Trans. Inform. Theory, vol. 42, pp. 429 445, Mar. 1996. [16] V. C. Gaudet, R. J. Gaudet, and P. G. Gulak, Programmable interleaver design for analog iterative decoders, IEEE Trans. Circuits Syst. II, vol. 49, pp. 457 464, July 2002. Vincent C. Gaudet (S 97 M 03) received the B.Sc. degree in computer engineering from the University of Manitoba, Winnipeg, MB, Canada, in 1995 and the M.Appl.Sci. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1997 and 2003, respectively. From February to July 2002, he was a Research Associate with the Département Electronique, Ecole Nationale Supérieure des Télécommunications de Bretagne, Brest, France. Currently, he is an Assistant Professor with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada. His research interests include the design of algorithms and integrated circuits for high-speed digital communications. Dr. Gaudet received the Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship, the Ontario Graduate Scholarship in Science and Technology, and the Walter Sumner Memorial Fund Scholarship, and was awarded the University of Manitoba s University Gold Medal in 1995. He currently serves as the IEEE Northern Canada Section Vice Chair, and is registered as an Engineer-in-Training in the Province of Ontario. P. Glenn Gulak (S 82 M 83 SM 96) received the Ph.D. degree from the University of Manitoba, Winnipeg, MB, Canada, in 1984. From 1985 to 1988, he was a Research Associate with the Information Systems Laboratory and the Computer Systems Laboratory, Stanford University, Stanford, CA. Currently, he is a Professor with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada, where he holds the L. Lau Chair in electrical and computer engineering. His research interests are in the areas of memory design, circuits, algorithms, and VLSI architectures for digital communications. Dr. Gulak is a registered Professional Engineer in the Province of Ontario. He received a Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship and several teaching awards for undergraduate courses taught in both the Department of Computer Science and the Department of Electrical and Computer Engineering at the University of Toronto. He served as the Technical Program Chair for the IEEE International Solid-State Circuits Conference in 2001.