Design of Multistage Decimation Filters Using Cyclotomic Polynomials: Optimization and Design Issues Massimiliano Laddomada, Member, IEEE

Similar documents
International Journal of Engineering Research-Online A Peer Reviewed International Journal

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

Performance Analysis and Behaviour of Cascaded Integrator Comb Filters

A review on the design and improvement techniques of comb filters

Suverna Sengar 1, Partha Pratim Bhattacharya 2

OPTIMIZED DIGITAL FILTER ARCHITECTURES FOR MULTI-STANDARD RF TRANSCEIVERS

FPGA Implementation of Optimized Decimation Filter for Wireless Communication Receivers

Effect of Compensation and Arbitrary Sampling in interpolators for Different Wireless Standards on FPGA Platform

DDC and DUC Filters in SDR platforms

Design & Simulation of 128x Interpolator Filter

Design and VLSI Implementation of Oversampling Sigma Delta Digital to Analog Convertor Used For Hearing Aid Application

An Enhancement of Decimation Process using Fast Cascaded Integrator Comb (CIC) Filter

Design on CIC interpolator in Model Simulator

Area-Efficient Decimation Filter with 50/60 Hz Power-Line Noise Suppression for ΔΣ A/D Converters

THE USE OF forward error correction (FEC) in optical networks

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Memory efficient Distributed architecture LUT Design using Unified Architecture

ALONG with the progressive device scaling, semiconductor

TERRESTRIAL broadcasting of digital television (DTV)

Low-Power Decimation Filter for 2.5 GHz Operation in Standard-Cell Implementation

A Parallel Area Delay Efficient Interpolation Filter Architecture

Multirate Digital Signal Processing

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

An Lut Adaptive Filter Using DA

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

Design and Implementation of LUT Optimization DSP Techniques

LUT Optimization for Memory Based Computation using Modified OMS Technique

FPGA Realization of Farrow Structure for Sampling Rate Change

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Design of Memory Based Implementation Using LUT Multiplier

Optimization of memory based multiplication for LUT

Filterbank Reconstruction of Bandlimited Signals from Nonuniform and Generalized Samples

Journal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved.

NON-UNIFORM KERNEL SAMPLING IN AUDIO SIGNAL RESAMPLER

Design of an Error Output Feedback Digital Delta Sigma Modulator with In Stage Dithering for Spur Free Output Spectrum

Developing Inter-disciplinary Education in Circuits and Systems Community

Implementation of Memory Based Multiplication Using Micro wind Software

Guidance For Scrambling Data Signals For EMC Compliance

Digital Correction for Multibit D/A Converters

A MULTIPLIERLESS RECONFIGURABLE RESIZER FOR MULTI-WINDOW IMAGE DISPLAY

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Power Reduction and Glitch free MUX based Digitally Controlled Delay-Lines

Adaptive decoding of convolutional codes

Piya Pal. California Institute of Technology, Pasadena, CA GPA: 4.2/4.0 Advisor: Prof. P. P. Vaidyanathan

A Novel Architecture of LUT Design Optimization for DSP Applications

Robert Alexandru Dobre, Cristian Negrescu

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

AN 623: Using the DSP Builder Advanced Blockset to Implement Resampling Filters

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

OMS Based LUT Optimization

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

IN DIGITAL transmission systems, there are always scramblers

An Efficient Reduction of Area in Multistandard Transform Core

Politecnico di Torino HIGH SPEED AND HIGH PRECISION ANALOG TO DIGITAL CONVERTER. Professor : Del Corso Mahshid Hooshmand ID Student Number:

Multirate Signal Processing: Graphical Representation & Comparison of Decimation & Interpolation Identities using MATLAB

DESIGN OF INTERPOLATION FILTER FOR WIDEBAND COMMUNICATION SYSTEM

Hardware Implementation of Viterbi Decoder for Wireless Applications

On the Characterization of Distributed Virtual Environment Systems

Introduction to Data Conversion and Processing

An MFA Binary Counter for Low Power Application

LFSR Counter Implementation in CMOS VLSI

RECOMMENDATION ITU-R BT Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios

An FPGA Implementation of Shift Register Using Pulsed Latches

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

Low Power Area Efficient Parallel Counter Architecture

Implementation and performance analysis of convolution error correcting codes with code rate=1/2.

Modified Reconfigurable Fir Filter Design Using Look up Table

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

Implementation of High Speed Adder using DLATCH

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Fault Detection And Correction Using MLD For Memory Applications

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Efficient Implementation of Multi Stage SQRT Carry Select Adder

(12) United States Patent

New Results on QAM-Based 1000BASE-T Transceiver

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Adaptive Key Frame Selection for Efficient Video Coding

HYBRID CONCATENATED CONVOLUTIONAL CODES FOR DEEP SPACE MISSION

Implementation of a turbo codes test bed in the Simulink environment

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11)

Designing Fir Filter Using Modified Look up Table Multiplier

Figure 1.LFSR Architecture ( ) Table 1. Shows the operation for x 3 +x+1 polynomial.

K. Phanindra M.Tech (ES) KITS, Khammam, India

SIC Vector Generation Using Test per Clock and Test per Scan

ISSN (Print) Original Research Article. Coimbatore, Tamil Nadu, India

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

WE CONSIDER an enhancement technique for degraded

Techniques for Extending Real-Time Oscilloscope Bandwidth

A Low Power Delay Buffer Using Gated Driver Tree

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Implementation of CRC and Viterbi algorithm on FPGA

Distributed Arithmetic Unit Design for Fir Filter

Transcription:

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 1977 Design of Multistage Decimation Filters Using Cyclotomic Polynomials: Optimization and Design Issues Massimiliano Laddomada, Member, IEEE Abstract This paper focuses on the design of multiplier-less decimation filters suitable for oversampled digital signals. The aim is twofold. On one hand, it proposes an optimization framework for the design of constituent decimation filters in a general multistage decimation architecture. The basic building blocks embedded in the proposed filters belong, for a simple reason, to the class of cyclotomic polynomials (CPs): the first 104 CPs have a z-transfer function whose coefficients are simply 1 0 +1.On the other hand, the paper provides a bunch of useful techniques, most of which stem from some key properties of CPs, for designing the proposed filters in a variety of architectures. Both recursive and non-recursive architectures are discussed by focusing on a specific decimation filter obtained as a result of the optimization algorithm. Comparisons are given with respect to classical comb filters and some recently proposed techniques. Design guidelines are provided with the aim to simplify the design of the constituent decimation filters in the multistage chain. Index Terms Analog digital (A/D) converter, cascade-integrator comb (CIC) filter, comb, cyclotomic, decimation, decimation filter, multistage, polynomial, sigma-delta(61), sinc filters. Owing to the condition, the decimation of an oversampled signal is efficiently [1] accomplished by cascading two (or more) decimation stages as highlighted in Fig. 1, in which a multistage architecture composed by decimation stages is shown as reference scheme. Consider an oversampling ratio which can be factorized as follows: whereby, for any, is an appropriate integer strictly greater than zero. In the general architecture shown in Fig. 1, sampling rate decreases in consecutive stages, whereby the sampling rate at the input of the th stage is while the output sample data rate is I. INTRODUCTION AND PROBLEM FORMULATION T HE design of multistage decimation filters for oversampled signals is a well-known research topic [1]. Mainly inspired by the need of computationally efficient architectures for wide-band, multistandard, reconfigurable receiver design, this research topic has recently garnered new emphasis in the scientific community [2] [5]. Multistage decimation filters are also employed for decimating highly oversampled signals from noise-shaping analog digital (A/D) converters [6]. Given a base-band analog input signal with bandwidth, an A/D converter produces a digital signal by sampling at rate, whereby is the oversampling ratio (notice that for oversampled signals). The normalized maximum frequency contained in the input signal is defined as, and the digital signal at the input of the first decimation filter has frequency components belonging to the range. Fig. 1 shows this system and its characteristic transfer function. Manuscript received July 13, 2007; revised September 27, 2007 and December 4, 2007. First published February 8, 2008; last published August 13, 2008 (projected). This paper was recommended by Associate Editor S.-M. Phoong. The author is with the Department of Electrical Engineering, Texas A&M University, Texarkana, TX 75505 USA (e-mail: laddomada@polito.it). Digital Object Identifier 10.1109/TCSI.2008.918193 The design of any decimation stage in a multistage architecture imposes stringent constraints on the shape of the frequency response over the so-called folding bands. Considering the scheme in Fig. 1, the frequency response of the th decimation filter must attenuate the quantization noise (QN) falling inside the frequency ranges defined as even odd whereby is the normalized signal bandwidth at the input of the th decimation filter. The reason is simple: the QN falling inside these frequency bands will fold down to baseband (i.e., inside the useful signal bandwidth ) because of the sampling rate reduction by in the th decimation stage, irremediably affecting the signal resolution after the multistage decimation chain. On the other hand, frequency ranges labelled as don t care bands in Fig. 1, do not require a stringent selectivity since the QN within these bands will be rejected by the subsequent filters in the multistage chain. The relation between and is as follows: whereby it is. (1) 1549-8328/$25.00 2008 IEEE

1978 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 Fig. 1. General architecture of a m-stage decimation chain for A/D converters, along with a pictorial representation of the key frequency intervals to be carefully considered for the design of the ith decimation stage. The sampling rate at the input of the ith decimation stage is f 8i =1;...;m. The th decimation filter introduces a pass-band ripple which can also be expressed in decibels as follows: while the selectivity (in decibels) corresponds to With this background, let us provide a quick survey of the recent literature related to the problem addressed here. This survey is by no means exhaustive and is meant to simply provide a sampling of the literature in this fertile area. Excellent tutorials on the design of multirate filters can be found in [7], [8] while an essential book on this topic is [1]. Recently, Coffey [9], [10] addressed the design of optimized multistage decimation and interpolation filters. The design of cascade-integrator comb (CIC) filters was first addressed in [11], while multirate architectures embedding comb filters have been discussed in [12]. Since then, many papers [13] have focused on the computational optimization of CIC filters, even in the light of new wideband and reconfigurable receiver design applications [14] [16]. Comb filters have been then generalized in [17] [21], especially in relation to the decimation of modulated signals. Other works somewhat related to the topic addressed in this paper are [22] [28]. The use of decimation sharpened filters embedding comb filters is addressed in [20] [22], while in [23] authors proposed computational efficient decimation filter architectures using polyphase decomposition of comb filters. Dolecek and Mitra proposed a novel two-stage sharpened comb decimator in [24]. The design of FIR filters using cyclotomic polynomial (CP) prefilters has been addressed in [25], while effective algorithms for the design of low-complexity FIR filters embedding CP prefilters have been proposed in [26] [28]. Owing to the discussion on the folding bands presented above, this paper addresses the design of computationally efficient decimation filters suitable for oversampled digital signals. Natural eligible blocks used in filter design are CPs with order less than 105, since these polynomials possess coefficients belonging to the set. We first recall the basic properties of CPs in Section II, since these properties (2) (3) suggest useful hints at the basis of the practical implementation of the designed decimation filters. For conciseness, we address the design of the first stage in the multistage architecture, even though the considerations which follow, are easily applicable to any other stage in the chain. The computational complexity of basic CP filters is discussed in Section III. In Section IV we propose an optimization framework whose main aim is to design an optimal decimation filter (optimal in that the cost function to be minimized accounts for the number of additions required by the chosen CP filter) featuring high selectivity within the folding bands seen from the th decimation stage. The practical implementation of the designed decimation filters is addressed in Section V, whereby both recursive and nonrecursive architectures stemming from a variety of properties of polynomials, are discussed. Finally, Section VI draws the conclusions. II. BASICS OF CPS AND KEY PROPERTIES CPs date back to the old Greek problem of dividing a circle in equal parts. Key properties of such polynomials along with the basic rationales can be found in various number theory books (we invite the interested readers to refer to [29], [30]), other than in some recent papers [30]. Given an integer strictly greater than zero, polynomial can be factorized as a product of CPs as follows: whereby identifies the set of integers, less than, or equal to, which divides (in other words, the remainder of the division between and is zero). For each as above, there is a unique polynomial whose roots satisfy the following conditions. For each, the roots of constitute a subset of the roots belonging to the polynomial. The roots of are the primitive th roots of unity, i.e., they all fall on the -plane unit circle. The number of roots corresponds to the number of positive integers which are prime with respect to, and smaller than. (4)

LADDOMADA: DESIGN OF MULTISTAGE DECIMATION FILTERS USING CPS 1979 Roots of do not belong to the set of roots of the polynomial :. Based on the observations above, polynomials are defined as (5) 1) Given a prime number,itis 2) Let,, and be three positive integers. Then, it is (10) whereby is used to mean that and are co-prime [29]. Notice that, given an integer, (5) allows us to write the -transfer function of any CP indexed by. Key advantages of CPs in connection to filter design rely on the following property: if has no more than two distinct odd prime factors, polynomials contain coefficients belonging to the set. From a practical point of view, CP coefficients belong to the set if [29], [30]. The degree of the polynomial is not, but it is defined as follows: (6) (11) 3) Consider a prime number, which does not divide, then (12) 4) Given any odd integer greater or equal to 3, then it is (13) 5) For, the following relation holds: whereby is the totient function, i.e., the number of positive integers less than or equal to that are relatively prime 1 to, while is the Möbius function defined as with prime, if is divisible by the squares of a prime. Index in the second entry stands for the number of distinct prime numbers which decompose the argument. The -transfer function of a CP with square-free index is [31] whereby coefficients recursive relation: (7) (8) can be evaluated with the following using the initial value. Function in (9) is the greatest common divisor between and. Notice that (9) represents an effective algorithm for automatically generating the -transfer function of CPs with square-free indexes. Perhaps, the main properties useful for deducing the -transfer function of any CP, are the ones summarized in the following [29]. We will discuss the application of such properties in Section III, whereby the focus will be on the design of low complexity CPs in terms of both additions and delays. 1 Two numbers are said to be relatively prime if they do not contain any common factor. Notice that the integer 1 is considered as being relatively prime to any integer number. (9) prime otherwise. (14) This relation assures us that for indexes, -transfer function of the respective CP presents unity gain in baseband provided that. Otherwise, CP transfer functions have to be normalized by in order to assure unity gain in baseband. III. CRITERIA FOR IDENTIFYING LOW-COMPLEXITY CPS The -transfer function of CPs for any index can be deduced upon employing the relation (5) along with the properties stated in (10) (13). Different architectures (both recursive and nonrecursive) for implementing each CP can be obtained, mainly differing in the number of required additions and delays. For conciseness, in this paper we show the -transfer functions of the first sixty CPs in Table II (shown later; the -transfer functions of for any in both nonrecursive and recursive (if any) form can be found in [32]. Let us discuss some key examples by starting from the CP. Considering that 33 is square-free, and given that can be written as 3 11, whereby 3 and 11 are coprimes, there are three possible architectures for implementing such a polynomial. The first one stems from (8) and (9), and it consists of a nonrecursive architecture (see Table II) employing 14 additions and 20 delays. On the other hand, two recursive architectures follow upon using property (12) with, and (15)

1980 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 As far as the number of additions is concerned, from (15) it easily follows that the architecture requires only 4 additions, which compares favorably with both the nonrecursive implementation and. Notice also that, since CP coefficients are simply, the recursive architectures can be implemented without coefficient quantization; this in turn suggests that exact pole-zero cancellation is not a concern with these architectures. On the other hand, the nonrecursive architecture requires only 20 delays as opposed to the recursive architectures requiring, respectively, 34 and 22 delays. In this work, we suppose that the computational complexity of the filter depends only on the number of additions. Upon comparing both recursive and nonrecursive architectures in Table II for any (see also [32] for a complete list of the first 104 CPs), it easily follows that recursive implementations, when they do exist, allow the reduction of the number of additions with respect to nonrecursive implementations; the price to pay, however, relies on the increased filter delay. As a rule of thumb, nonrecursive architectures should be preferred to recursive implementations when memory space is a design constraint. On the other hand, recursive architectures can greatly reduce the number of additions. Let us briefly discuss the possible architectures related to an even indexed CP, such as. By virtue of the different ways to factorize the integer 60, property (12) can be applied with the following combinations, whereby in both cases is a prime integer not dividing. Property (11) can be applied with. In Table II we show only both the recursive and the nonrecursive architectures yielding the lowest complexities. When is a prime number, the -transfer function of the related CP corresponds to the first-order comb filter, as can be straightforwardly seen from (10). Finally, property (13) can be effectively employed for deducing the -transfer function of CPs with even indexes of the form, with an odd number strictly greater than 2. As an example, notice the following relations:,. The simple examples presented above are by no means a complete picture of the capabilities and sophistication that can be found in multistage structures for sampling rate conversion. They are merely intended to show why such structures can constitute the starting point for obtaining computationally efficient filters for decimating oversampled signals. The design of computationally efficient decimation filters relies on the combination of an appropriate set of CPs. In oversampled A/D converters, for example, it is very important to contain the computational burden of the first stages in the multistage decimation chain. This motivates the study of an effective algorithm for identifying an appropriate set of CPs that, when appropriately cascaded, is able to attain a set of prescribed requirements as specified in (2) and (3): this is the topic addressed in the next section. IV. OPTIMIZATION ALGORITHM AND DESIGN EXAMPLES This section presents an optimization framework for designing low complexity decimation filters,, as a cascade of CP subfilters. For the derivations which follow, consider the design of the th decimation filter in the multistage chain depicted in Fig. 1, with a frequency response that can be represented as follows: (16) whereby is the digital frequency normalized with respect to the sampling frequency as discussed in Section I, is an appropriate set of eligible CPs to be used in the optimization framework ( is the cardinality of the set, i.e., the number of eligible CPs), is the frequency response of the CP indexed by, and is the integer order by which appears in the cascade constituting (it is ). A suitable cost function accounting for the complexity of the th decimation filter, can be defined as a weighted combination of the number of adders and delays required by the overall filter [27] (17) whereby and are, respectively, the number of adders and delays of the CP, and is a factor depending on the relative complexity of the delays with respect to the adders. In our setup, we assume that the computational complexity of the th decimation filter is mainly due to the number of adders; therefore, we set. Notice that the cost function depends on the CP orders, while and are known once the set of eligible CPs has been appropriately identified. Notice also that and can be straightforwardly obtained by Table II [32]. Let us address the choice of the eligible CPs in the set. This is one of the most important design steps since the complexity of the optimization framework discussed below, is tied tightly to the number of eligible CPs. By virtue of the discussion on the folding bands spanned by the th decimation filter, we choose the eligible CPs between the 104 CPs in such a way that: 1) at least 20% of zeros falls within the folding bands defined in (1) and 2) none of the zeros falls in the signal pass-band ranging from 0 to. As a result of extensive tests, we adopted such a threshold which is capable to reject about 20 60 initial CPs, depending on. Of course, lower thresholds can increase the number of eligible CPs at the cost of an increased complexity of the optimization framework discussed below. On the other hand, when designing the th decimation filter in a multistage architecture, only the so-called folding bands must be spanned by zeros, since don t care frequency bands will be appropriately spanned by the zeros belonging to the subsequent decimation filters in the cascade. Before presenting the optimization algorithm, let us discuss the requirements imposed to the frequency response of the th decimation filter in the cascade. Mask specifications [1] are given as for classical filters as far as the passband ripple is concerned. In particular, for the optimization algorithm we use the passband ripple in (2) expressed in decibels. One of the main differences between the design proposed in this work and other classical finite-impulse response (FIR) filter design techniques relies on the fact that in our setup the specifications are only

LADDOMADA: DESIGN OF MULTISTAGE DECIMATION FILTERS USING CPS 1981 imposed in the folding bands (1). To this end, we evaluated the lowest attenuations (worst-case) attained by each CP belonging to in each folding band By this setup, the optimization problem in (18) can be rewritten as whereby subscript signifies the fact that each CP has been normalized in such a way as to have unity gain in baseband. Notice that the normalization factors can be deduced from (14). is the worst attenuation of the th CP in within the th folding band, with, and defined in (1). Such values (in decibels) have been stored in look-up tables. Once the set of eligible CPs along with the appropriate specifications (passband ripple and folding band attenuations) have been identified, the optimization problem can be formulated as follows: subject to in (17) (18) The optimization problem can be also solved for different prescribed selectivities, (as specified in (3)), around the various folding bands. In this work we do not pursue this approach. However, notice that such an approach can be effective for noise shaping A/D converters which present an increasing noise power spectra density for higher and higher values of the digital frequency [18]. Upon setting ever-increasing values of in correspondence of successive folding bands can help mitigating the noise folding due to the decimation process. The solution of the optimization problem in (18) is the set of CP orders, whereby signifies the fact that the th CP in is not employed for synthesizing. Upon collecting the set of conditions in the matrix and the requirements (18) can be rewritten as follows:, the constraints in and solved by mixed integer linear programming techniques [33]. We solved the optimization problem using the Matlab function linprog, along with a new matlab file capable of managing integer constrained solutions (the latter file is available online [34]). The results of the previous optimization problem are summarized in Table I for various specifications on, and two different values of, namely and 2 db. We solved the problem for three different values of the decimation factor of the first stage in the decimation chain depicted in Fig. 1, by assuming that the residual decimation factor is (in other words, we assumed that ). Notice that such an approach is quite usual in practice in that the first decimation filter accomplishes the highest possible decimation in order to reduce the sampling rate, while the subsequent decimation stages are usually accomplished with half-band filters each one decimating by 2 [1]. The first row related to any decimation factor shows the set of eligible CPs found in the preliminary design step discussed above, while the -transfer functions of the CPs can be found in Table II, shown later [32]. It is worth comparing the frequency responses of the optimized filters and (for,2,3)in Table I with the specifications db and various.to this end, Figs. 2 and 3 show, respectively, the behaviours of the frequency responses and along with the imposed selectivity around the various folding bands (identified by horizontal bold lines). V. IMPLEMENTATION ISSUES AND COMPARISONS This section addresses the design of optimized CP-based decimation filters. For conciseness, we will focus on the design of the decimation filter shown in Table I, even though the considerations which follow can be applied quite straightforwardly to any other decimation filter. The decimation stage related to is depicted in Fig. 4(a): this decimation filter will be designed through a variety of architectures stemming from different mathematical ways to simplifies the analytical relation defining. First of all, notice that upon substituting the appropriate equations of the constituent CP filters in, the designed filter takes on the following expression: (19)

1982 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 TABLE I OPTIMIZATION RESULTS Fig. 2. Behaviors in decibels of the modulus of the frequency responses H (f );H (f );H (f ) of the optimized decimation filters shown in Table I for D =8. Fig. 3. Behaviors in decibels of the modulus of the frequency responses H (f );H (f );H (f ) of the optimized decimation filters shown in Table I for D =16. which can be rewritten as follows: tion can be achieved by applying polyphase decomposition to the architecture shown in Fig. 4(b). To this aim, consider the -transfer function of the third-order cell (20) From the commutative property employed in [12], the cascaded implementation shown in Fig. 4(b) easily follows. The th stage in Fig. 4(b) operates at the sampling rate, whereby is the data sampling rate at the filter input, as shown in the multistage architecture in Fig. 1. Further power consumption reduc- (21) The polyphase architecture for easily follows from the commutative property applied to the two filters and

LADDOMADA: DESIGN OF MULTISTAGE DECIMATION FILTERS USING CPS 1983 Fig. 4. (a) Efficient architectures for implementing the decimation stage embedding H (z). (b) Nonrecursive architecture. (c) Polyphase implementation of the decimation stages decimating by 2. (d) Polyphase component implementation using shift registers. (e) Recursive architecture of the decimation filter H (z). in (21), and it is shown in Fig. 4(c) along with the architectures for implementing both and. Notice that the multipliers appearing in and can be implemented in the form of shift registers as depicted in Fig. 4(d). The actual complexity of the architecture shown in Fig. 4(b) is fully defined once the data wordlength in any substage is well characterized, since the power consumption of a filter cell can be approximated as the product between the data rate, the number of additions performed at that specific rate, and the data wordlength. While the data rate along with the number of additions are well defined, data wordlength in each substage in Fig. 4(b) is not. Given the input data wordlength, (in bits), the data size at the output of the first decimation substage in Fig. 4(b) is equal to bits, since two carry bits have to be allocated for the two additions involved in that substage. With a similar reasoning, data wordlength increases at the output of each subsequent substage in Fig. 4(b) in order to take into account the increase of data size due to the involved additions. As a reference example, if the decimation filter depicted in Fig. 4(b) is the first decimation stage at the output of a A/D converter embedding a 1-bit quantizer into the loop, it is. Thus, data wordlength is as low as 3 bits after the first decimation substage, and so on. Let us address the design of a recursive architecture for in (19). First of all, consider the following equality chain: factors of the form are quite common in practice. Upon using (22) with and, (19) can be rewritten as follows: The last relation in (23) can be simplified as follows: (23) (24) A recursive implementation of filter in (24) is shown in Fig. 4(e). It is obtained in the same way as for a classic cascade integrator-comb filter implementation [11]. In other words, the numerator in (24) corresponds to the comb sections at the right of the decimator by 2, while the denominator is responsible for the integrator sections at the left of the decimator by. The derivations yielding (24) from (23), can also be accomplished by following another reasoning 3 based on the relation (25) which is valid for any positive with an odd integer. By doing so, (19) can be rewritten as follows: (22) whereby the first equality holds for any that can be written as an integer power of 2, i.e.,. On the other hand, the last equality holds for any integer value of. Notice that decimation (26) 2 Notice that (1 0 z ) becomes (1 0 z ) upon its shifting through the decimator by D =8. 3 We discuss this other approach for completeness, since it can be effective for deriving an appropriate architecture for other decimation filter shown in Table I.

1984 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 TABLE II THE FIRST SIXTY CPS Upon simplifying, (26) yields (24). An alternative nonrecursive architecture stems from a full polyphase decomposition of the transfer function. Upon solving the multiplications between the involved polynomials in (19), can be rewritten as follows: By applying the polyphase decomposition [35], rewritten as can be (28) (27) whereby is the length of the impulse response. The -transfer function in (28) is implemented with the architecture

LADDOMADA: DESIGN OF MULTISTAGE DECIMATION FILTERS USING CPS 1985 Fig. 5. (a) Architecture of the polyphase implementation of the decimation filter H (z). (b) Efficient design of the first two polyphase components E (z) and E (z). shown in Fig. 5(a). The polyphase components,, can be easily obtained by employing (27). In particular, the first two polyphase components take on the following expressions: (29) An efficient architecture for implementing each polyphase component stems from the decomposition of each integer as the summation of power-of-two coefficients, as shown in (29) for the first two polyphase components and.by doing so, and employing coefficient sharing arguments, practical architectures featuring a minimum number of shift registers easily follow as depicted in Fig. 5(b). Similar considerations can be employed for obtaining the architectures of the remaining polyphase components. A. Comparisons In this section we compare the filters designed in the previous section with classical comb filters as well as with other three techniques proposed recently in literature. For the sake of comparing the designed filter with known decimation filters, consider once again the specifications db and db noted in Table I. The most efficient decimation filters to be employed for decimating oversampled signals are comb filters [1], [23]: as the decimation filters proposed in this paper, comb filters do not require real multiplications. We recall that the transfer function of a th-order comb filter is defined as [11]: (30) Fig. 6. (a)nonrecursive architecture of a third-order comb filter decimating by D = 2. (b) Block diagram of the filter H (z) (c) Architecture of a fourth-order comb filter. (d) Architecture of a third-order MSDF filter (a =1+ 2cos(), b = 1 + 2 cos(d), =(q=);q =0:79). (e) Architecture of a Lth-order SCD filter. (f) Architecture of a two-stage sharpened comb decimator filter. where is the desired decimation factor. In order to attain the underlined specifications when, a third-order comb filter is required [ in (30)]. Such a comb filter can be implemented using the nonrecursive architecture shown in Fig. 6(a) [23]. A quick comparison between the architecture shown in Fig. 6(a) and the one related to the designed filter, and noted in Fig. 4(b), indicates a reduction of one bit in the word size in the first stage due to the fact that one addition is saved with respect to comb filters. Of course, such a reduction is also inherited by the successive stages in the decimation chain. We want to point out that comb filters represent the most efficient filters known so far; in this respect, any further reduction of the word size in the practical implementation is a challenging task. As a second comparison example, consider the specifications, db, and db noted in Table I. The designed filter can be implemented by the architecture shown in Fig. 6(b). On the other hand, the same specifications could be attained by the comb filter in (30) upon choosing. The architecture implementing a fourth-order comb filter is depicted in Fig. 6(c). Once again, a comparison between the architecture shown in Fig. 6(c) and the one related to the proposed filter [depicted in Fig. 6(b)]f reveals a data size reduction of 1 bit in the second decimation stage, along with 2 bits in the third stage. From a practical point of view, data size

1986 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008 reduction turns to be quite effective for reducing the power consumption of the designed filters. Consider the modified-sinc decimation filters (MSDFs) proposed in [17] along with the specifications db and db noted in Table I. In order to attain the underlined specifications when, a third-order MSDF is required. The MSDF filter, which can be implemented using the recursive architecture proposed in [17] (with the optimal value ) and shown in Fig. 6(d), presents a real multiplier operating at the high frequency as well as a real multiplier operating at the reduced frequency after the decimation by. A comparison between the architecture shown in Fig. 6(d) and the one related to the designed filter, and noted in Fig. 4(e), shows that the recursive architecture for implementing the proposed filter is computationally efficient in that it does not require any real multiplication. Let us focus on the sharpened CIC decimation (SCD) filters proposed in [22]. An th-order SCD filter decimating by presents the following frequency response: (31) In order to meet the specifications db and db, a SCD filter with is required under the setup above. The architecture implementing such filter is depicted in Fig. 6(e), whereby is a second-order comb filter as defined in (30) with. The computational complexity required by this class of filters is well above to the one guaranteed by the filter designed with the proposed technique, independently from the specific architecture (recursive/non recursive) adopted for implementing the comb filter. The last comparison regards the technique proposed in [24] for the design of two-stage sharpened comb decimator (2SSCD) filters. The transfer function of a general 2SSCD filter is defined as [24] (32) whereby, and are two suitable integers greater than zero and chosen in order to satisfy the requirement [24], and In order to meet the specifications db and db, a 2SSCD filter with parameters and is required under the setup above, assuming and. The architecture for implementing such a filter is shown in Fig. 6(f). Once again, the computational complexity required by the designed 2SSCD filter is well above the one guaranteed by the filter depicted in Fig. 4(b). VI. CONCLUSION This paper addressed the design of multiplier-less decimation filters suitable for oversampled digital signals. The aim was twofold. On one hand, it proposed an optimization framework for the design of constituent decimation filters in a general multistage decimation architecture using as basic building blocks CPs, since the first 104 CPs have simple coefficients. On the other hand, the paper provided a bunch of useful techniques, most of which stemming from some key properties of CPs, for designing the optimized filters in a variety of architectures. Both recursive and non-recursive architectures have been discussed by focusing on a specific decimation filter obtained as a result of the optimization algorithm. Design guidelines were provided with the aim to simplify the design of the constituent decimation filters in the multistage chain. REFERENCES [1] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 1983. [2] J. Mitola, The software radio architecture, IEEE Commun. Mag., vol. 33, no. 5, pp. 26 38, May 1995. [3] M. Laddomada, F. Daneshgaran, M. Mondin, and R. M. Hickling, A PC-based software receiver using a novel front-end technology, IEEE Commun. Mag., vol. 39, no. 8, pp. 136 145, Aug. 2001. [4] F. Daneshgaran and M. Laddomada, Transceiver front-end technology for software radio implementation of wideband satellite communication systems, Wireless Personal Commun., vol. 24, no. 12, pp. 99 121, Dec. 2002. [5] A. A. Abidi, The path to the software-defined radio receiver, IEEE J. Solid-State Circuits, vol. 42, no. 5, pp. 954 966, May 2007. [6] S. R. Norsworthy, R. Schreier, and G. C. Temes, Delta-Sigma Data Converters, Theory, Design, and Simulation. New York: IEEE Press, 1997. [7] R. E. Crochiere and L. R. Rabiner, Interpolation and decimation of digital signals-a tutorial review, Proc. IEEE, vol. 69, no. 3, pp. 300 331, Mar. 1981. [8] P. P. Vaidyanathan, Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial, Proc. IEEE, vol. 78, no. 1, pp. 56 93, Jan. 1990. [9] M. W. Coffey, Optimizing multistage decimation and interpolation processing Part I, IEEE Signal Process. Lett., vol. 10, no. 4, pp. 107 110, Apr. 2003. [10] M. W. Coffey, Optimizing multistage decimation and interpolation processing Part II, IEEE Signal Process. Lett., vol. 14, no. 1, pp. 24 26, Jan. 2007. [11] E. B. Hogenauer, An economical class of digital filters for decimation and interpolation, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 2, pp. 155 162, Apr. 1981. [12] S. Chu and C. S. Burrus, Multirate filter designs using comb filters, IEEE Trans. Circuits Syst., vol. CAS-31, no. 11, pp. 913 924, Nov. 1984. [13] R. A. Losada and R. Lyons, Reducing CIC filter complexity, IEEE Signal Process. Mag., vol. 23, no. 4, pp. 124 126, Jul. 2006. [14] Y. Gao, J. Tenhunen, and H. Tenhunen, A fifth-order comb decimation filter for multistandard transceiver applications, in Proc. ISCAS, Geneva, Switzerland, May 28 31, 2000, pp. III-89 III-92. [15] F. J. A. de Aquino, C. A. F. da Rocha, and L. S. Resende, Design of CIC filters for software radio system, in Proc. IEEE ICASSP, 2006, vol. 3, pp. 225 228. [16] T. Ze and S. Signell, Multi-standard delta-sigma decimation filter design, in Proc. IEEE APCCAS, Dec. 4 7, 2006, pp. 1212 1215. [17] L. L. Presti, Efficient modified-sinc filters for sigma-delta A/D converters, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 11, pp. 1204 1213, Nov. 2000. [18] M. Laddomada, Generalized comb decimation filters for 61 A/D converters: Analysis and design, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 5, pp. 994 1005, May 2007. [19] M. Laddomada and M. Mondin, Decimation schemes for 61 A/D converters based on Kaiser and Hamming sharpened filters, Proc. IEE Vision, Image Signal Process., vol. 151, no. 4, pp. 287 296, Aug. 2004.

LADDOMADA: DESIGN OF MULTISTAGE DECIMATION FILTERS USING CPS 1987 [20] F. Daneshgaran and M. Laddomada, A novel class of decimation filters for 61 A/D converters, Wireless Commun. Mobile Comput., vol. 2, no. 8, pp. 867 882, Dec. 2002. [21] M. Laddomada, Comb-based decimation filters for 61 A/D converters: Novel schemes and comparisons, IEEE Trans. Signal Process., vol. 55, no. 5, pt. 1, pp. 1769 1779, May 2007. [22] A. Y. Kwentus, Z. Jiang, and A. N. Willson, Jr., Application of filter sharpening to cascaded integrator-comb decimation filters, IEEE Trans. Signal Process., vol. 45, no. 2, pp. 457 467, Feb. 1997. [23] H. Aboushady, Y. Dumonteix, M. Louérat, and H. Mehrez, Efficient polyphase decomposition of comb decimation filters in 61 analog-todigital converters, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 48, no. 10, pp. 898 903, Oct. 2001. [24] G. Jovanovic-Dolecek and S. K. Mitra, A new two-stage sharpened comb decimator, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1414 1420, Jul. 2005. [25] R. J. Hartnett and G. F. Boudreaux-Bartels, On the use of cyclotomic polynomial prefilters for efficient FIR filter design, IEEE Trans. Signal Process., vol. 41, no. 5, pp. 1766 1779, May 1993. [26] H. J. Oha and Y. H. Lee, Design of efficient FIR filters with cyclotomic polynomial prefilters using mixed integer linear programming, IEEE Signal Process. Lett., vol. 3, no. 8, pp. 239 241, Aug. 1996. [27] H. J. Oh and Y. H. Lee, Design of discrete coefficient FIR and IIR digital filters with prefilter-equalizer structure using linear programming, IEEE Trans. Circuits Syst. II, Analog Digit. Sginal Process., vol. 47, no. 6, pp. 562 565, Jun. 2000. [28] K. Supramaniam and Y. Lian, Complexity reduction for frequency-response masking filters using cyclotomic polynomial prefilters, in Proc. IEEE ISCAS, May 21 24, 2006. [29] M. R. Schroeder, Number Theory in Science and Communication: With Applications in Cryptography, Physics, Digital Information, Computing, and Self-Similarity, 3rd ed. New York: Springer-Verlag, 1997. [30] J. H. McClellana and C. M. Rader, Number Theory in Digital Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 1979. [31] E. W. Weisstein, Cyclotomic Polynomial, Mathworld (A Wolfram web resource), Champaign, IL, Internal Rep. [Online]. Available: http:// mathworld.wolfram.com/cyclotomicpolynomial.html" [32] M. Laddomada, Some Properties Along With The z-transfer Functions Of The First 104 Cyclotomic Polynomials, Politecnico di Torino, Turin, Italy, Internal Rep., 2007 [Online]. Available: http://www.tlc.polito.it/ dcc_team/research.php?id=4 [33] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization, Algorithms and Complexity. Mineola, NY: Dover, 1998. [34] Mixed Integer Linear Programming Matlab file, The Mathworks Inc., Natick, MA [Online]. Available: http://www.mathworks.com/matlabcentral/fileexchange/loadfile.do?objectid=6990&objecttype=file [35] A. Antoniou, Digital Signal Processing: Signals, Systems, and Filters. New York: McGraw-Hill, 2005. Massimiliano Laddomada (S 00 M 03) was born in 1973. He received the degree in electronics engineering and the Ph.D. degree in communications engineering from Politecnico di Torino, Turin, Italy, in 1999, and 2003, respectively. From June 2000 to March 2001, he was a Visiting Researcher at California State University (CSU), Los Angeles, and a Consultant Engineer with Technoconcepts, Inc., Los Angeles, a start-up company specializing in Software Radio. He was a Research Associate at Politecnico di Torino from 2006 to 2008, and he has been a part-time faculty at CSU since 2006. In 2008, he joined the Department of Electrical Engineering, Texas A&M University, Texarkana, as an Assistant Professor. His research is mainly in wireless communications, especially modulation and coding, including turbo codes and, more recently, networks coding. Dr. Laddomada was awarded a five-year open-ended fellowship by E.D.S.U. in recognition of his university career as an Electronics Engineer. In 2003, he was awarded with the Premio Zucca per l Innovazione nell ICT from Unione Industriale of Turin. He is currently serving as a member of the editorial boards of IEEE Communications Surveys and Tutorials, and the International Journal of Digital Multimedia Broadcasting (Hindawi).