IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15,

Size: px
Start display at page:

Download "IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15,"

Transcription

1 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, A High-Performance Energy-Efficient Architecture for FIR Adaptive Filter Based on New Distributed Arithmetic Formulation of Block LMS Algorithm Basant K. Mohanty, Senior Member, IEEE, and Pramod Kumar Meher, Senior Member, IEEE Abstract In this paper, we present an efficient distributedarithmetic (DA) formulation for the implementation of block least mean square (BLMS) algorithm. The proposed DA-based design uses a novel look-up table (LUT)-sharing technique for the computation of filter outputs and weight-increment terms of BLMS algorithm. Besides, it offers significant saving of adders which constitute a major component of DA-based structures. Also, we have suggested a novel LUT-based weight updating scheme for BLMS algorithm, where only one set of LUTs out of sets need to be modified in every iteration, where,,and are, respectively, the filter length and input block-size. Based on the proposed DA formulation, we have derived a parallel architecture for the implementation of BLMS adaptive digital filter (ADF). Compared with the best of the existing DA-based LMS structures, proposed one involves nearly times adders and times LUT words, and offers nearly times throughput of the other. It requires nearly 25% more flip-flops and does not involve variable shifters like those of existing structures. It involves less LUT access per output (LAPO) than the existing structure for block-size higher than 4. For block-size 8 and filter length 64, the proposed structure involves 2.47 times more adders, 15% more flip-flops, 43% less LAPO than the best of existing structures, and offers 5.22 times higher throughput. The number of adders of the proposed structure does not increase proportionately with block size; and the number of flip-flops is independent of block-size. This is a major advantage of the proposed structure for reducing its area delay product (ADP); particularly, when a large order ADF is implemented for higher block-sizes. ASIC synthesis result shows that, the proposed structure for filter length 64, has almost 14% and 30% less ADP and 25% and 37% less EPO than the best of the existing structures for block size 4 and 8, respectively. Index Terms Adaptive filters, block LMS, distributed arithmetic, VLSI. I. INTRODUCTION ADAPTIVE DIGITAL FILTERS (ADFs) are widely used in various signal-processing applications, such as echo cancellation, system identification, noise cancellation and Manuscript received June 18, 2012; accepted October 07, Date of publication October 25, 2012; date of current version January 25, The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Zhiyuan Yan. B. K. Mohanty is with the Department of Electronics and Communication Engineering, Jaypee University of Engineering and Technology, Raghogarh, Guna, Madhya Pradesh, India ( bk.mohanti@juet.ac.in). P. K. Meher is with the Institute for Infocomm Research, 1 Fusionopolis Way, Singapore ( pkmeher@i2r.a-star.edu.sg, url: Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TSP channel equalization etc. [1]. Amongst the existing ADFs, least mean square (LMS)-based finite impulse response (FIR) adaptive filter is the most popular one due to its inherent simplicity and satisfactory convergence performance. However, the delay in availability of the feedback-error for updating the weights according to the LMS algorithm does not favor its pipeline implementation when sampling rate is high. Haimi et al. [2] have proposed the delayed LMS (DLMS) algorithm for pipeline implementation of LMS-based ADF. The delayed LMS is similar to the LMS algorithm except that the correction terms for updating the filter weights of the current iteration are calculated from the error corresponding to a past iteration. Several schemes have been proposed to implement the DLMS-based ADFs efficiently in a systolic VLSI with minimum adaptation delay [2] [4], [7], [8]. To avoid adaptation delay in pipelined LMS ADF, Poltmann [5] has proposed a modified DLMS algorithm which is used by Douglas et al. [6] to derive a systolic architecture. But, the structure of [6] involves large amount of hardware resources compared to the earlier one [2]. The block LMS (BLMS) ADF [9] is one of the useful derivatives of the LMS ADF for fast and computationally-efficient implementation of ADFs. Unlike the conventional LMS ADF, BLMS ADF accepts a block of input for computing a block of output and updates the weights using a block of errors in every training cycle. The BLMS ADF has convergence performance similar to the LMS ADF, but the BLMS ADF of block-length offers fold higher throughput compared with the other. Keeping this in view, many variant of BLMS algorithm like time and frequency-domain block filtered-x LMS (BFXLMS) has been proposed for specific applications [20]. Das et al. [21] have proposed efficient BFXLMS using FFT and fast Hartley transform (FHT), which is computationally more efficient. We have proposed a delayed block LMS (DBLMS) algorithm [15], and a concurrent multiplier-based architecture for high-throughput pipeline implementation of BLMS ADFs. The structure of [15] provides fold higher throughput rate and demands times more resources compared to those of DLMS ADF. Baghel et al. [17], [18] have suggested a distributed-arithmetic (DA)-based structure for FPGA implementation of BLMS ADFs. A lowcomplexity design has been proposed in [19] for BLMS ADFs. This structure supports a very low sampling rate since it uses single multiply-accumulate (MAC) cell for the computation of filter output and weight-increment term. To take the advantage of DA-based hardware designs [12], Allred et al. [10] have suggested a scheme to derive a DA-based design for LMS-ADF. The structure of [10] requires separate X/$ IEEE

2 922 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 look-up-tables (LUTs) for the calculation of filter output and weight-increment terms. The LUT used for the computation of filter output and weight-increment term of DA LMS-ADF is named as DA-F-LUT and DA-A-LUT, respectively. In every iteration, entire content of DA-F-LUT is updated to compute the weight-increment term, where half the content of DA-A-LUT is updated to accommodate the new input sample arriving at the current iteration. Updating the LUTs is the most time consuming operation in DA-based LMS-ADF, since the updating is performed sequentially at different LUT locations. The LUT update time, therefore, depends on the size of the LUT to be updated. For most practical adaptive filters, we need to use a decomposition scheme, where small size LUTs can be used in DA-based LMS-ADF which not only helps in reducing the LUT size but also in reducing LUT-update time. Recently Guo et al. [16] have suggested a scheme to avoid the DA-A-LUT in DA-based LMS-ADF, where both filtering and weight-updating are performed using DA-F-LUT. On the other hand, throughput rate of existing DA-LMS ADFs could be slow for real-time applications due to bit-serial nature of DA computation. Although, there are some interesting work on DA-based LMS ADF [10], [16], we find that the potential application of DA for the implementation of BLMS ADF is yet to be explored. In order to reduce the power consumption of DA-based designs, we aim at reducing the number of words in the LUT and less LUT-access. DA-based BLMS ADF structure can be derived by extending the scheme of [10], but this structure would demand times more hardware (memory and combinational logic) for times more throughput rate. The scheme of [16] offers sharing of LUT for the computation of both filter output and weight-increment term, but this scheme can not be applied to derive a DA-based structure for BLMS ADFs, because separate inner-product computation (IPC) is performed for calculation of filter output and weight-increment term of BLMS ADF whereas in case of LMS ADF, IPC is performed to calculate the filter output only. In this paper, we have formulated the DA-BLMS algorithm for sharing of LUTs for the computation of filter output and weight-increment terms. The key contributions of this paper are: DA-based formulation of BLMS algorithm where both convolution operation to compute filter output and correlation operation to compute weight-increment term could be performed by using the same LUT. A novel approach for minimization of number of LUT words to be updated per output. This helps to save external logic and power consumption. We have derived a DA-based structure for BLMS-ADF using the proposed DA-formulation and a novel LUT updating scheme. The most remarkable aspect of the proposed scheme is that the number of adders required by the structure does not increase proportionately with filter order, and the number of flip-flops required by the structure is independent of the block-size. Apart from that, the proposed structure has significantly less LUT access than the existing DA-LMS structure for higher block-sizes. The rest of this paper is organized as follows: Mathematical formulation is presented in Section II. The new-lut update scheme is discussed in Section III, and the proposed structure for DA-based BLMS ADF is presented in Section IV. Hardwareand time-complexities of the proposed structure are discussed in Section V. Conclusion is presented in Section VI. II. MATHEMATICAL FORMULATION The BLMS algorithm for updating the filter weights in the -th iteration is given by where is defined as and are, respectively, the weight-vector and the errorvector of the -th iteration defined as: where is the step-size; and the input matrix is derived from the current input block of length,and past samples, given by The error-vector is computed as where the desired response vector is defined as The -th block of filter output is computed by the matrixvector product: A. Computation of Filter Output The input matrix of size can be decomposed into square matrices of size each, where. Similarly, the weight vector can be decomposed into short weight-vectors of size,for. The computation of (4) can then be expressed as the sum of matrix-vector products: where and are defined as (1) (2) (3) (4) (5)

3 MOHANTY AND MEHER: A HIGH-PERFORMANCE ENERGY-EFFICIENT ARCHITECTURE FOR FIR ADAPTIVE FILTER 923 for,and (12b) inner- Each filter output now can be written as the sum of products as and are the -th bit of and, respectively. Substituting (12a) in (7), we have (6) where is an -point inner-product of an input-vector and are given by and is the -th row of given by (7) (13) Rearranging the order of summation, (13) may otherwise be expressed as: (14) for,, and. Note that we have dropped the subscript of in (7) only for convenience of further discussion, without loss of generality. B. Computation of Weight Increment Term The weight-increment vector can be decomposed into short vectors of size each, for. Computation of (2) can be performed through independent matrix-vector multiplication using the relation where,and defined as (9) Using (8), the individual weight increment terms could be evaluated by the following equation (8) (10) where is the inner-product between the vector and, given by (11) Here also we have dropped the subscript of for convenience of further discussions. As shown in (7) and (11), the input-vector is the same for a pair of inner-products and. This is a major advantage in order to optimize the LUTs when the inner-products of (7)and(11)areperformed using the DA principle. C. DA-Formulation Let and, respectively, be the -th components of the -point vectors and, and assumed to be -bit numbers in 2 s complement representation: (12a) where, for,and for. Each term in the inner sum in (14) represents the inner-product of with a bit-vector (or bit-slice) of weightvector. Corresponding to possible values of a bit-vector of length, there could be possible values of such innerproducts of with any possible bit-vector of length.all those possible inner-products could be pre-computed and stored in an LUT, such that when the -th bit-vector (or bit-slice) of weight vector for, is fed to the LUT as address, its inner-product with, is read from the LUT. The computation of inner sum of (14), therefore, could be expressed in the form of memory read operation as: (15) where is a memory-read operation, and its argument for, is used as LUT-address. The inner-product of (11) may, similarly, be expressed in the form of memory-read operation as (16) where is the -th bit-vector of error-vector defined as:, which is used as address of an LUT to read its inner-products with.lut contents for the computation of and are exactly the same, since the LUT content depends on the input-vector, and generated for all possible bit-slices of -bit length, irrespective of whether that is of the weight-vector or the error-vector. When the bit-vector is used as address, the partial results of are read from the LUT, and when is used as address, then partial results of are read from the same LUT. Therefore, by using the proposed scheme, a common set of LUTs could be used for the computation of filter outputs and weight-increment terms. Since, the block of input samples changes after every iteration, the LUTs are required to be updated in every iteration to accommodate the new input-block. In the next Section, we have presented a novel LUT-updating scheme for the DA-based BLMS ADFs.

4 924 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 Fig. 1. (a) Inner-products of FIR filteroflength, and block-size. The input-vectors corresponding to inner-product is shown inside the box. (b) LUT arrangement for DA-based computation of the FIR filter of,and.eachlutherestores possible values of partial inner-product of input-vector and bit-vector of of length,for and. III. LUT-UPDATING SCHEME Before, we discuss the proposed LUT-updating scheme, we summarize here the proposed decomposition of input-matrix and weight-vector into small vectors, and their participation in the inner-product computation for filtering operation. The inputmatrix of size is decomposed into square matrices of size and is decomposed into short-vectors of size,for where. Each of rows of represents an input-vector, so that such input-vectors (,for ) are derived form, and such input-vectors are derived from,for. All these input-vectors are arranged in rows and columns such that, input-vectors of belong to -th column. According to (5), weight-vectors are multiplied independently with matrices which, in total, involves inner-products. According to (6), results of inner-products corresponding to each row of input-vectors are added together for obtaining a filter output. From such rows of inner-products, filter outputs are obtained. We have illustrated here the aforementioned scheme for the implementation of FIR filteroflength and block-size. Suppose, during the -th iteration the filter receives an input-block and computes a block of output. As discussed above, the input-matrix of size 2 6 is decomposed into 3 square-matrices, and of size 2 2. consists of a pair of input-vectors ( and ), and similarly and consist of pair of input-vectors and, respectively. The 6-point weightvector is decomposed into 3 number of 2-point weight-vectors. Fig. 1(a) shows the arrangement of input-vectors and weight-vectors; and the corresponding inner-products are shown on the top of the rectangular boxes for clarity. Results of odd-numbered inner-products (on upper row) and even-numbered inner-products (on lower row) are added separately (not showninthefigure) to obtain and, respectively. Fig. 2. DA-based computation of the block FIR filter for and. (a) for -th iteration, (b) for -th iteration. As shown in Fig. 1(a), the same weight-vector is used for the computation of inner-product of a particular column of input-vectors. For DA realization, LUT corresponding to each and stores partial inner-products generated by the inner-product of the corresponding input-vector with all possible values of a bit-vector of length. DA-based parallel computation of filter outputs of Fig. 1(a) for the -th iteration is shown in Fig. 1(b). As shown in Fig. 2(a), the DA-based structure receives an input-block during the -th iteration, so that two new samples enter into the set of 7 samples, and two oldest samples are discarded. Consequently, samples of the all 6 input-vectors are changed. But, it occurs in a particular order. We can find from Fig. 1(b) and Fig. 2(a), that the contents of only the first column of LUTs of Fig. 2(a) are changed by the new samples while in other columns, the LUT values remain the same. But the position of those unchanged LUTs are shifted right by one-column. For instance, values stored in the LUTs of second column of Fig. 2(a) are the same as values stored in LUTs of the first-column of Fig. 1(b), and similarly values stored in LUTs of third column of Fig. 2(a) are the same as those LUTs of second-column Fig. 1(b). This feature can be observed in the LUT contents of Fig. 2(b) for the -th iteration also. In other word, contents of a particular column of LUTs during a particular iteration are simply transferred to the adjacent column of LUTs on its right during the next iteration. In this way, the oldest input samples of particular set are shifted out through the -th column ( in the example) of LUTs, and new values are entered at the first column of LUTs. Shifting of values physically from one LUT to the next across the array of LUTs is highly time consuming and power consuming. Therefore, we have proposed a novel LUT updating scheme, where the LUT content need not be shifted. Since, each column of LUTs uses the same weight-vector as LUT-address, the column-wise right-shift of LUT values can be achieved by a left-shift of the weight-vectors. This technique could save a lot of time and power, since the shifting of weight-vectors is significantly less expensive than the shifting of LUT contents. In the proposed LUT update scheme, contents of only one column

5 MOHANTY AND MEHER: A HIGH-PERFORMANCE ENERGY-EFFICIENT ARCHITECTURE FOR FIR ADAPTIVE FILTER 925 Fig. 3. (a) Equivalent DA-based structure of Fig. 2(a) which is derived from structure of Fig. 1(b) by changing the content of 5th and 6th LUT (shown in grey color) and left shifting the weight-vectors by one-position. (b) Equivalent DA-based structure of Fig. 2(b) derived from the structure of Fig. 3(a) by changing content of 3rd and 4th LUT (shown in grey color) and left-shifting the weight-vectors by one position. of LUTs out of 3 such columns (for ) need to be updated in every iteration. We can find from Fig. 1(b) and Fig. 2(a) that, the values of the third-column LUTs of the -th iteration are not used during -th iteration, since they correspond to the oldest block of samples.the LUTs of the third column are updated as shown in Fig. 3(a) in grey-color. To feed weight-vectors to LUTs of Fig. 3(a) in the same order as that of Fig. 2(a), weight-vectors of Fig. 1(b) are simply left-shifted by one location. As shown in Fig. 3(a), the second-column of LUTs contain the values corresponding to the samples, which is the oldest block of samples in the -th iteration, and this input-block is discarded and corresponding LUTs are updated by the partial inner-products of new input-block. Weight-vectors of Fig. 3(a) are left-shifted by one column, and fed to LUTs of Fig. 3(b) as addresses. In the following, we summarize the proposed scheme for updating LUTs of BLMS-based adaptive filter: LUTs are updated column-by-column in every iteration in cyclic order. The LUTs which store the values of partial inner-products corresponding to samples of the oldest input block are overwritten by those of the new input block. The weight-vectors are circularly left-shifted after every iteration to change the columns of LUT to be read circularly. The values required for updating a column of LUTs for any particular iteration are calculated from samples of the current input-block and samples of the most recent past samples of the previous block. Based on the above scheme, LUT-matrix is updated column-by-column from right to left after every iteration. The updating process starts from the -thcolumnoflutsandgoes to the first column on a cyclic manner, and then again from the first column it goes to the -thcolumnandthentothe Fig. 4. Proposed DA-based structure for implementation of BLMS adaptive FIR filters (for and ), where,,and. -th column and so on. Hence, LUTs of one particular column are updated once in a period of iterations. IV. PROPOSED ARCHITECTURE Proposed DA-BLMS structure is comprised of one DA-module, one error bit-slice generator (EBSG) and one weight-update cum bit-slice generator (WBSG). WBSG updates the filter weights and generates the required bit-vectors in accordance with the DA-formulation. EBSG computes the error block according to (3) and generates its bit-vectors. The DA-module updates the LUTs and makes use of the bit-vectors generated by WBSG and EBSG to compute the filter output and weight-increment terms according to (15) and (16). A. Structure for Block-Size The proposed structure for DA-based BLMS adaptive filter for and is shown in Fig. 4. The DA-module receives a block of input samples in every iteration, and computes a block of filter output. It also receives a block of errors in every iteration, and computes the weight-increment term for all the components of the weight-vector. The structure of proposed DA-module is shown in Fig. 5. It consists of 4 identical processing elements (PEs) for, one LUT-update block and one MUX-array. Structure of the PE is shown in Fig. 6. It consists of 4 identical subcells (SCs) for. Internal structure and function of the -th SC is shown in Fig. 7. As required by (15), LUT of the -th SC of this PE stores 16 possible values corresponding to the samples, where. The LUT-update block of the DA-module generates the required values to update LUTs of a particular PE. Structure of the LUT-update block is shown in Fig. 8. It consists of one adder-block and an input delay unit (IDU), which stores samples of the previous block. During each iteration, the adder

6 926 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 Fig. 5. Structure of DA-module of the proposed DA-BLMS ADF (for and ). The subscript of, and varies from 0 to in cycles. Fig. 6. Internal structure of -th processing element (PE) of the DA-module for block-size,where. block receives samples ( samples from the current input block and past samples from the IDU), and feeds these samples to adder-cells (ACs) (see Fig. 8) such that each AC receives samples, and input blocks of adjacent ACs are overlapped by samples. During the -th iteration, AC- receives input samples and AC- receives the samples.for block-size, each AC receives a block of four samples in everyiteration(showninfig.9).asshowninthefigure, each of the four inputs of the AC is ANDed with a bit of the four-bit address by four AND cells. Each AND cell consists of AND gates, where is the word-length of input samples. All those AND gates are fed with a bit of the address, Fig. 7. Internal structure and function of -th subcell (SC) of a PE, where and,. Convergence factor is assumed to be power of 2. while the other input of the AND gates are fed with a bit of input sample. The output of AND cells are fed to an adder-tree (AT). AC receives 16 possible values of in 16 clock cycles, and calculates 16 values of to be stored in the LUT, where is used as the address of the LUT location and is the equivalent integer value of. All the ACs of the adder block (see Fig. 8) work in parallel, and generate all the required values to update LUT of SCs of a PE. According to the proposed LUT-update scheme, LUTs of one PE out of PEs are updated in every iteration. LUTs of all the PEs are updated once in cycles

7 MOHANTY AND MEHER: A HIGH-PERFORMANCE ENERGY-EFFICIENT ARCHITECTURE FOR FIR ADAPTIVE FILTER 927 Fig. 8. Internal structure of LUT-update block for block-size,where. Fig. 11. Structure of error computation cum bit-slice generator (EBSG) for block-size,where, and. Fig. 9. Internal structure of -th AC of the LUT-update block for blocksize. Fig. 10. Internal structure of MUX-array for and. in a cyclic order. Each PE uses separate control signal (,for ) to enable the specific columnoflutstobe updated. LUT-update operation of proposed structure is completed during the first clock cycles of every iteration. Each PE receives the bit-vectors, and through the MUX array (shown in Fig. 9) for updating the LUTs or computation of filter outputs or weight-increment terms, respectively. After completion of the LUT-update, filtering computation follows immediately for the next clockcyclesbyaseries of LUT-read operations using the bit-slices of corresponding weight-vector in LSB to MSB order, as successive addresses according to (15). During the -th cycle of filtering, the WBSG generates parallel bit-vectors of width bits each for the PEs to perform the filtering operation. Each SC receives a sequence of bit-vectors,(for where is the wordlength of the filter-coefficients) from the WBSG in clock cycles. The LUT-read values are shift-accumulated in an accumulator (ACC) to obtain a partial filter output. During the -th cycle the LUT output is subtracted from the accumulated result since the bit-vector during this cycle contains the sign-bits of weight-vector. Each SC uses control signal CTR1 to control add/substract operation in the ACC. At the end of the -th cycle, ACC contents are sent to the DMUX as input, and the ACC register (not shown in Fig. 7) is cleared to be used for the computation of weight-increment term from the next cycle (CTR1 is used for clearing the register). Finally the DMUX sends the computed partial results of inner-products to the output line using the select signal CTR6. From SCs of each PE, partial results are obtained in parallel, the corresponding output of each SC from PEs are added by an AT (Fig. 5) to obtain (the -th component of -th block of filter output). A block of parallel filter output ( )areobtainedfrom ATs of the DA-module in each cycle. EBSG receives one block of filter output ( ) from the DA-module, and calculates a block of error ( ) in every iteration using one block of desired response according to (3). As shown in Fig. 11, error values are loaded in parallel-in serial-out (PISO) shift-registers of the bit-slice-generater (BSG) to generate bit-vectors of error-vector. CTR4 enables the clock for the BSG and CTR2 controls load-shift operation of each SR. Bit-vectors,for, fed serially in LSB to MSB order to the DA-module in successive clock cycles to compute weight-increment terms for the -th iteration. According to (16), LUT values of the -th block of filter output are also used to compute weight-increment terms for the -th iteration. In general, LUT values of -th SC of -th PE (for, ) are used to compute the weight-increment term. The -th PE, therefore, computes the weight-increment terms.the computation of weight-increment-terms is similar to the partial filter outputs. But in this case the same bit-vector is used

8 928 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 TABLE I LUT UPDATING SCHEME FOR AND,WHERE, :BLOCK SIZE, :FILTER ORDER by all the PEs of the DA-module to compute the weight-increment terms. In each SC (see Fig. 7), the ACC contents corresponding to the weight-increment term is sent to the output line of the DMUX. The weight-increment terms are scaled by a factor.herewehaveassumed is a power of 2, so that the scaling of by is realized by a right-shift operation using a fixed-shifter (see Fig. 7). Accordingto(1),theWBSGoftheproposedDA-BLMS structure requires only the weight-increment terms of the current iteration to update the weight-vector for the next iteration. It does not require the LUT values of the current iteration. Therefore, once the weight-increment terms of the current iteration are computed, the LUT-updating operation for the next iteration can be started immediately in the next clock cycle. As we discussed earlier, the filter computation follows the LUT-update operation, and first clock cycles of every iteration are used to complete the LUT-update operation. During this period, weight-update operation of WBSG also can be performed concurrently. A bit-parallel (word-serial) structure of WBSG requires one clock-cycle to complete the weight-update operation, while a bit-serial structure of WBSG requires clock-cycles to complete the same task. If wordlength of filter-coefficients ( ) is less than or equal to the LUT-size, then bit-serial realization of WBSG does not increase the iteration period of the DA-BLMS structure, but it certainly helps to reduce the hardware complexity of the DA-BLMS structure. For and, we can have a bit-serial structure for the WBSG. Bit-serial structure of WBSG receives the weight-increment terms from the DA-module in bit-serial LSB to MSB order, and updates the weight-vector accordingly. For bit-serial realization of WBSG, weight-increment terms computed by each PE of the DA-module are finally loaded into a separate BSG (see Fig. 5) to generate the weight-increment terms in bit-serial order. All the BSG of the DA-module uses common control signals CTR6 and CTR5 to perform the loading and sifting operations, respectively. WBSG is an important block of the proposed structure. It performs three operations: (i) updates filter weights using the weight-increment values calculated by the DA-module, (ii) generates bit-vectors for the DA-module to compute -th block of filter output, (iii) gives one left-shift (circularly) to the weight-vectors as necessitated by the proposed LUT-update scheme. We have shown LUT updating of the DA-BLMS ADF for and in Table I for the first 5 iterations using the proposed LUT-updating scheme. As shown in Table I, for and, the LUT-matrix has4columns(for ). LUTs of all these 4 columns are updated once in a period of 4 iterations. At any given iteration, the LUT-matrix contains the values corresponding to recent past input samples to compute ablockof4filter output. As shown in Table I, during the 5-th iteration, LUT-matrix ( to ) contain the values corresponding to set of input samples ( to ). These set of 19 samples are exactly required to compute the filter output ( to ). Similarly, the LUT-matrix contain the values corresponding to the set of samples during 6-th iteration, and these samples are exactly required to compute filter outputs ( to ). The bit-serial structure of WBSG is shown in Fig. 12. It consists of serial-in serial-out (SISO) SRs and carry-save full-adders (CSFAs) corresponding to filter weights. SRs are arranged in matrix form; and filter-weights are stored in the SR matrix column-wise, such that weight-vector is stored in -th column of SRs. As shown in Table I for, that bit-slices of the weight-vector are received by the PE whose LUTs are to be updated during the -th iteration, and are generated from the first column of filter weights. The weight-vector to be aligned with the corresponding PE. If during the -th iteration, LUT of PE-1 is to be updated, then the first column of SR-matrix is required to contain the components of weight-vector and the -thcolumnofsrs should contain components of weight-vector.asshown in Fig. 12, weight-increment values of the -thcolumnof filter coefficients (available in the -th column of SR-matrix) are obtained from the -th PE, and these values are added with the corresponding filter-weights bit-serially using a carry-save full-adder (CSFA). Results of CSFA of -th column constitute a bit-vector of. SR contents are shifted left for clock cycles, to generated the shifted weight-vectors in accordance with the proposed LUT-update scheme. Shifting operation of the SRs starts at -th clock cycle of every iteration, and continue for clock cycles. The control signal CTR5 is used in WBSG to enable the shifting operation. D flip-flop of each CSFA is cleared during the first clock cycle of every iteration to flush-out the final carry of the previous iteration of weight-update operation. B. Structure for Higher Block-Size To derive DA-based BLMS structure for higher block sizes using LUT of 16 words, we can take the block-size to be an multiple of 4, i.e.,where is an integer. The structures

9 MOHANTY AND MEHER: A HIGH-PERFORMANCE ENERGY-EFFICIENT ARCHITECTURE FOR FIR ADAPTIVE FILTER 929 Fig. 12. Bit-serial structure of weight-update cum bit-slice generator (WBSG) for, and. of EBSG and WBSG of the DA-BLMS filter for (for ) are the same as those of block-size shown in Fig. 11 and Fig. 12, respectively. However, the AC of the LUTupdate block and the SC of each PE of the DA-module need to be modified according to the value of. Each SC in this case, is comprised of LUTs of size 16 words each. The bit-vectors of weight-vectors and error-vectors of bits each are splitted into segments of 4-bit size, and fed to LUTs of each SC to read the LUTs in parallel. The values read from the LUTs are added using an AT and subsequently shift-accumulated in the ACC for obtaining a partial output. To generate the weight update-values for LUTs, each AC of the LUT-update block in this case is comprised of AND-TA blocks of size 4 (as shown in Fig. 9). For block-size, each SC involves RAM words and adders along with one ACC and 2 DMUX. Similarly, the LUT-update block involves AND-gates and adders. V. HARDWARE-TIME COMPLEXITY AND PERFORMANCE COMPARISON A. Hardware Complexity Proposed structure is comprised of one DA-module, one WBSG, one EBSG and a control unit. The DA-module consists of one LUT-update block, PEs, adder-trees of words each, one MUX-array and BSG, where and the block-size. LUT-update block consists one IDU and ACs, where the IDU is comprised of registers of size, and each AC is comprised of AND-gates and adders. LUT-block, therefore, involves registers, adders and AND-gates. Each PE consists of SCs, where each SC is comprised of LUTs of 16 words each, adders, one ACCs, one 1-to-2 line DMUX and number of 2-input XOR-gates (used by ACC (not shown in Fig. 7) to compute 1 s complement of the LUT-outputs when the bit-vector contains sign-bits), where ACC involves one adder, one register and one 2-to-1 line DMUX ( ). Each PE, therefore, involves memory words, adders, registers, DMUXes (2-to-1 line) and XOR-gates. Each BSG is comprised of SRs (bit-level) of size. MUX-array involves 2-to-1 line MUXes. The DA-module, therefore, involves memory words, adders, D-type flip-flops (FFs), 2-to-1-line MUXes/DMUXes (word-level), AND-gates and XOR-gates. WBSG involves D-type FFs and FAs. EBSG involves D-type FFs and adders. Proposed structure, therefore, requires memory words, adders, FAs, D-type FFs, MUXes/DMUXes (word-level), AND-gates and and XOR-gates. B. Time Complexity The proposed structure performs four operations sequentially in every iteration. Those are (i) LUT update, (ii) filter output computation, (iii) error calculation and (iii) computation of weight-increment term. It involves 16 clock cycles to complete LUT-update operation. It takes clock cycles to calculate partial results of a block of filter output. It calculates a block of filter output from the partial results and then block of error in one clock cycle. Finally it takes clock cycles to compute the weight-increment term for the weight vector. In every iteration, proposed structure processes one block of samples, where one iteration involves clock cycles and duration of one clock cycle is, where is the delay of one -bit adder. For comparison purpose, we have also estimated number of clock cycles required by the structure of [10] and [16] for one iteration. We assumed the read and write operations are performed in two separateclockcyclesinaluttomaintainuniformityinthe comparison. Structure of [10] requires 16 clock cycles to update the DA-A-LUT of size 16 words, clock cycles to compute one filter output and 32 clock cycles to update the DA-F-LUT

10 930 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 TABLE II GENERAL COMPARISON OF HARDWARE COMPLEXITY OF THE PROPOSED STRUCTURE (FOR BLOCK-SIZE ) AND THE STRUCTURE OF [10] AND [16] (WITH DECOMPOSITION FACTOR 4) AND THE DA-BLMS STRUCTURE OF [18] LEGEND: ADD: adder, MULT: multiplier, FF: flip-flop, VSH: variable shifter, TR: throughput rate, LAPO: LUT access per output.,,,,,. In addition to the above list of components the proposed structure involves FAs, 2-input AND-gates and 2-input XOR-gates, where : word length of the sequence, and, : word-length of input sequence,. For the proposed structure,, andincaseof[10]and[16],, and in case of [18],,, and block-size,where and are relatively prime to each other. of size 16 words. It involves 48 clock cycles for one iteration and computes one output per iteration, where the duration of one clock cycle is and. Since, the structure of [16] does not involve DA-F-LUT, it requires 16 clock cycles for updating the DA-A-LUT and clock cycles to compute one filter output. The structure of [16], therefore, involves clock cycles for one iteration, where the duration of the clock period is the same as that of [10]. C. Number of LUT Access During every iteration, proposed structure computes filter outputs, and performs write operations for updating the LUTs, LUT read operations for filter output computation and LUT read operations for the computation of weight-increment terms. The number of LUT access per output (LAPO) of the structure is, therefore,. Similarly, the number of LAPO of [10] and [16] are found to be and, respectively, where is the bit-width of the input samples and is the bit-width of all the intermediate and output samples. Note that, LUTs of DA-based ADF are required to be implemented by RAM, and the total energy consumption of the structure, therefore, increases significantlywithlapo. D. Performance Comparison Hardware and time complexities the proposed structure and the DA-LMS structures of [10], [16], and DA-BLMS structure of [18] are listed in Table II for comparison. The structure of [16] is the most efficient one amongst the existing DA-LMS structures. Compared with [16], proposed structure requires times more LUT words, nearly times more adders, 4/3 times more FFs and offers nearly times higher throughput rate. It involves 16 more LAPO for block-size 4 and less LAPO for block-size 8 than those of [16] for 16-bit internal bit precision. Interestingly, number of adders of the proposed structure does not increase proportionately with block-size in the proposed structure and number of flip-flops is independent of block-size. Besides, it does not require variable shifters unlike those of [10] and [16]. We have estimated hardware and time complexity of proposed structure for and 8, and that of [10] and [16] for filter size (, 32 and 64) using the complexity counts of Table II. The estimated values are listed in Table III for comparison. Compared with the structure of [16], proposed structure for involves 8 times more LUT words; 3.27 times more adders on average for different filter orders, and offers 5.22 times higher throughput. But, it involves, respectively, 37.5%, 24.4%, 17.8% more flip-flops and 25%, 37.5%, 47.6% less LAPO than those of [16] for filter orders 16, 32, 64, respectively. E. Simulation Result To validate the proposed design, we have coded it in VHDL for filter order 16, 32 and 64 with block-size 4 and 8. We have also coded the design of [10] and [16] for the same filter orders. We have considered and, and synthesized both the designs by Synopsys Design Compiler using TMSC 90 nm CMOS library. Synthesis reports obtained from the Design Compiler are listed in Table IV. Synthesis results are in accordance with the theoretical estimation given in Table III. The minimum clock period of the proposed structure and the structure of [16] are slightly higher than those of [10] due to the extra MUX/DMUX in the critical path. As shown in Table IV, structure of [16] is the most efficient amongst the existing structures. Compared with [16], proposed structure for block size and 8 involve, respectively, 2.13 and 3.69 times more area on average for different filter orders and offers nearly 2.61 and 5.22 times higher throughput rate, respectively.

11 MOHANTY AND MEHER: A HIGH-PERFORMANCE ENERGY-EFFICIENT ARCHITECTURE FOR FIR ADAPTIVE FILTER 931 TABLE III HARDWARE AND TIME-COMPLEXITY OF PROPOSED STRUCTURE AND STRUCTURE OF [10] AND [16] FOR DIFFERENT SIZE FILTERS., TABLE IV COMPARISON OF AREA, DELAY, AND POWER COMPLEXITIES OBTAINED FROM SYNTHESIS RESULT OF PROPOSED STRUCTURE AND STRUCTURE OF [10] AND [16] We have estimated ADP 1, PPO 2 and energy per output (EPO 3 ) at 20 MHz clock. As shown in Table IV, for block-size 4, the proposed structure has 17.47%, 18.49%, 13.66% less ADP than that of [16] for filter order 16, 32 and 64, respectively. For block-size 8, it has 31/6% less ADP than [16] on average for different filter orders. For block-size 4, it consumes 27.5%, 28.8% and 24.6% less EPO than that of [16] for filter order 16, 32 and 64, respectively. Similarly, for block-size 8, it consumes respectively, 40%, 39.8% and 37.4% less EPO than other for 16, 32 and 64 order filters. One can extrapolate these results to obtain the approximate values of ADP, PPO and EPO of the proposed structure for filter order greater than 64. One can also extrapolate these observations to obtain the approximate estimate of the advantages of proposed structure for filter order greater than VI. CONCLUSION We have derived a DA formulation of BLMS algorithm where both convolution and correlation are performed using a common LUT for the computation of filter outputs and weight increment terms, respectively. This results in a significant saving of LUT words and adders which constitute the major hardware components in DA-based computing structures. Also we have suggested a novel LUT updating scheme to update the LUT contents for DA-based BLMS ADF, where only one set of LUTs out of sets need to be modified in every iteration such that LUT contents are modified once in every iterations, where, is the filter length and is the input block-size. Using the proposed scheme, we have derived a parallel architecture for the implementation of DA-based BLMS ADF. Unlike the existing DA-based LMS structure, number of adders required by the proposed structure does not increase linearly with.compared with the best of the existing DA-based LMS designs, proposed one involves nearly times more adders, and times

12 932 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 4, FEBRUARY 15, 2013 more LUT words and offers nearly times throughput. It requires nearly 25% more flip-flops irrespective of the block-size, but does not involve variable shifters like others. It involves less number of LUT access per output than the existing structure for block-size higher than 4. This is a major advantage of the proposed structure for reducing its ADP and EPO when implemented for large order ADF, and for higher block-sizes. For block-size 8 and filter length 64, the proposed structure involves 2.47 times more adders, 15% more flip-flops, 43% less LAPO than the best of the existing structures, and offers 5.22 times higher throughput. ASIC synthesis result shows that, the proposed structure for filter order 64, has almost 14% and 30% less ADP and 25% and 37% less EPO than the best of the existing structures for block size 4 and 8, respectively. REFERENCES [1] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ: Wiley-Interscience, [2] R. Haimi-Cohen, H. Herzberg, and Y. Beery, Delayed adaptive LMS filtering: Current results, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Albuquerque, NM, Apr. 1990, pp [3] M. D. Meyer and D. P. Agrawal, A modular pipelined implementations of a delayed LMS transversal adaptive filter, in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, LA, May 1990, pp [4] V. Visvnathan and S. Ramanathan, A modular systolic architecture for delayed least mean square adaptive filtering, in Proc. IEEE Int. Conf. VLSI Des., Bangalore, 1995, pp [5] R. D. Poltmann, Conversion of the delayed LMS algorithm into the LMS algorithm, IEEE Signal Process. Lett., vol. 2, p. 223, Dec [6]S.C.Douglas,Q.Zhu,andK.F.Smith, ApipelinedLMSadaptive FIR filter architecture without adaptive delay, IEEE Trans. Signal Process., vol. 46, pp , Mar [7] L. D. Van and W. S. Feng, Efficient systolic Architectures for 1-D and 2-D DLMS adaptive digital filters, in Proc. IEEE Asia Pacific Conf. Circuits Syst., Tianjin, China, Dec. 2000, pp [8] L. D. Van and W. S. Feng, An efficient architecture for the DLMS adaptive filters and its applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 48, no. 4, pp , Apr [9] G.A.Clark,S.K.Mitra,andS.R.Parker, Blockimplementationof adaptive digital filters, IEEE Trans. Circuit Syst., vol. 28, pp , Jun [10] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMS adaptive filters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst., vol. 52, no. 7, pp , Jul [11] D.J.Allred,H.Yoo,V.Krishnan,W.Huang,andD.V.Anderson, A novel high performance distributed arithmetic adaptive filter implementation on an FPGA, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2004, vol. 5, p. V [12] S. A. White, Applications of distributed arithmetic to digital signal processing: A tutorial review, IEEE ASSP Mag., vol. 6, pp. 4 19, Jul [13] D.J.Allred,H.Yoo,V.Krishnan,W.Huang,andD.V.Anderson, An FPGA implementation for a high throughput adaptive filter using distributed arithmetic, in Proc. 12th Annu. IEEE Symp. Field-Programmable Custom Comput. Mach., 2004, pp [14] W. Huang and D. V. Anderson, Adaptive filters using modified sliding-block distributed arithmetic with offset binary coding, in Proc. IEEE In. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp [15] B. K. Mohanty and P. K. Meher, Delayed block LMS algorithm and concurrent architecture for high-speed implementation of adaptive FIR filters, presented at the IEEE Region 10 TENCON2008 Conf., Hyderabad, India, Nov [16] R. Guo and L. S. DeBrunner, Two high-performance adaptive filter implementation schemes using distributed arithmetic, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 9, pp , Sep [17] S. Baghel and R. Shaik, FPGA implementation of fast block LMS adaptive filter using distributed arithmetic for high-throughput, in Proc. Int. Conf. Commun. Signal Process. (ICCSP), Feb , 2011, pp [18] S. Baghel and R. Shaik, Low power and less complex implementation of fast block LMS adaptive filter using distributed arithmetic, in Proc. IEEE Students Technol. Symp., Jan , 2011, pp [19]R.Jayashri,H.Chitra,H.Kusuma,A.V.Pavitra,andV.Chandrakanth, Memory based architecture to implement simplified block LMS algorithm on FPGA, in Proc. Int. Conf. Commun. Signal Process. (ICCSP), Feb , 2011, pp [20] Q.ShenandA.S.Spanias, TimeandfrequencydomainXblockLMS algorithm for single channel active noise control, Control Eng. J., vol. 44, no. 6, pp , [21] D. P. Das, G. Panda, and S. M. Kuo, New block filtered-x LMS algorithms for active noise control systems, IEE Signal Procesd., vol.1, no. 2, pp , Jun [22] K. K. Parhi, VLSI Digital Signal Procesing Systems: Design and Implementation. New York: Wiley, [23] C. S. Burrus, Index mappings for multidimensional formulation of the DFT and convolution, IEEE Trans. Acoust., Speech, Signal Process., vol. 25, pp , Jun Basant K. Mohanty (M 06 SM 11) received M.Sc. degree in physics from Sambalpur University, India, in 1989 and the Ph.D. degree in the field of VLSI for digital signal processing from Berhampur University, Orissa, in In 2001, he joined as Lecturer in Electrical and Electronic Engineering Department, BITS Pilani, Rajasthan. Then, he joined as an Assistant Professor in the Department of Electronics and Communication Engineering, Mody Institute of Education Research (Deemed University), Rajasthan. In 2003, he joined Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, where he became Associate Professor in 2005 and full Professor in His research interest includes design and implementation of low-power and high performance systems for multimedia applications, multi-core processor design and algorithm for concurrent processing. He has published nearly 40 technical papers. Dr. Mohanty is a life time member of The Institution of Electronics and Telecommunication Engineering, New Delhi, India. He was the recipient of the Rashtriya Gaurav Award conferred by India International friendship Society, New Delhi, India for Pramod Kumar Meher (SM 03) received the M.Sc. degree in physics and the Ph.D. degree in science from Sambalpur University, India, in 1978, and 1996, respectively. Currently, he is a Senior Scientist with the Institute for Infocomm Research, Singapore, and Adjunct Professor with the School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, India. Previously, he was a Professor of Computer Applications with Utkal University, India, from 1997 to 2002, and a Reader in electronics with Berhampur University, India, from 1993 to His research interest includes design of dedicated and reconfigurable architectures for computation-intensive algorithms pertaining to signal, image and video processing, communication, bio-informatics and intelligent computing. He has contributed nearly 200 technical papers to various reputed journals and conference proceedings. Dr. Meher has served as a speaker for the Distinguished Lecturer Program (DLP) of IEEE Circuits Systems Society and Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS. Currently,he is serving as Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, theieeetransactions ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, andjournal of Circuits, Systems, and Signal Processing. He was the recipient of the Samanta Chandrasekhar Award for excellence in research in engineering and technology for 1999.

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block Jesmin Joy M. Tech Scholar (VLSI & Embedded Systems), Dept. of ECE, IIET, M. G. University, Kottayam, Kerala, India

More information

ALONG with the progressive device scaling, semiconductor

ALONG with the progressive device scaling, semiconductor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we

More information

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture Vinaykumar Bagali 1, Deepika S Karishankari 2 1 Asst Prof, Electrical and Electronics Dept, BLDEA

More information

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method M. Backia Lakshmi 1, D. Sellathambi 2 1 PG Student, Department of Electronics and Communication Engineering, Parisutham Institute

More information

LUT Optimization for Memory Based Computation using Modified OMS Technique

LUT Optimization for Memory Based Computation using Modified OMS Technique LUT Optimization for Memory Based Computation using Modified OMS Technique Indrajit Shankar Acharya & Ruhan Bevi Dept. of ECE, SRM University, Chennai, India E-mail : indrajitac123@gmail.com, ruhanmady@yahoo.co.in

More information

An Efficient Reduction of Area in Multistandard Transform Core

An Efficient Reduction of Area in Multistandard Transform Core An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai

More information

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier K.Purnima, S.AdiLakshmi, M.Jyothi Department of ECE, K L University Vijayawada, INDIA Abstract Memory based structures

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna

More information

OMS Based LUT Optimization

OMS Based LUT Optimization International Journal of Advanced Education and Research ISSN: 2455-5746, Impact Factor: RJIF 5.34 www.newresearchjournal.com/education Volume 1; Issue 5; May 2016; Page No. 11-15 OMS Based LUT Optimization

More information

An Lut Adaptive Filter Using DA

An Lut Adaptive Filter Using DA An Lut Adaptive Filter Using DA ISSN: 2321-9939 An Lut Adaptive Filter Using DA 1 k.krishna reddy, 2 ch k prathap kumar m 1 M.Tech Student, 2 Assistant Professor 1 CVSR College of Engineering, Department

More information

Memory efficient Distributed architecture LUT Design using Unified Architecture

Memory efficient Distributed architecture LUT Design using Unified Architecture Research Article Memory efficient Distributed architecture LUT Design using Unified Architecture Authors: 1 S.M.L.V.K. Durga, 2 N.S. Govind. Address for Correspondence: 1 M.Tech II Year, ECE Dept., ASR

More information

Implementation of Memory Based Multiplication Using Micro wind Software

Implementation of Memory Based Multiplication Using Micro wind Software Implementation of Memory Based Multiplication Using Micro wind Software U.Palani 1, M.Sujith 2,P.Pugazhendiran 3 1 IFET College of Engineering, Department of Information Technology, Villupuram 2,3 IFET

More information

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters International Journal of Computer Applications (975 8887) Volume 78 No.6, September Efficient Method for Look-Up-Table Design in Memory Based Fir Filters Md.Zameeruddin M.Tech, DECS, Dept. of ECE, Vardhaman

More information

Optimization of memory based multiplication for LUT

Optimization of memory based multiplication for LUT Optimization of memory based multiplication for LUT V. Hari Krishna *, N.C Pant ** * Guru Nanak Institute of Technology, E.C.E Dept., Hyderabad, India ** Guru Nanak Institute of Technology, Prof & Head,

More information

A Novel Architecture of LUT Design Optimization for DSP Applications

A Novel Architecture of LUT Design Optimization for DSP Applications A Novel Architecture of LUT Design Optimization for DSP Applications O. Anjaneyulu 1, Parsha Srikanth 2 & C. V. Krishna Reddy 3 1&2 KITS, Warangal, 3 NNRESGI, Hyderabad E-mail : anjaneyulu_o@yahoo.com

More information

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter Abstract: In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)- based

More information

Modified Reconfigurable Fir Filter Design Using Look up Table

Modified Reconfigurable Fir Filter Design Using Look up Table Modified Reconfigurable Fir Filter Design Using Look up Table R. Dhayabarani, Assistant Professor. M. Poovitha, PG scholar, V.S.B Engineering College, Karur, Tamil Nadu. Abstract - Memory based structures

More information

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT. An Advanced and Area Optimized L.U.T Design using A.P.C. and O.M.S K.Sreelakshmi, A.Srinivasa Rao Department of Electronics and Communication Engineering Nimra College of Engineering and Technology Krishna

More information

Reconfigurable Fir Digital Filter Realization on FPGA

Reconfigurable Fir Digital Filter Realization on FPGA Reconfigurable Fir Digital Filter Realization on FPGA Atmakuri Vasavi 1 Sita Madhuri Bondila 2 1 PG Student (M.Tech), Dept. of ECE, Gandhiji Institute of Science & Tech., Jaggaiahpeta, AP, India 2 Assistant

More information

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter International Journal of Emerging Engineering Research and Technology Volume. 2, Issue 6, September 2014, PP 72-80 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) LUT Design Using OMS Technique for Memory

More information

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach Circuits and Systems, 216, 7, 1379-1391 Pulished Online June 216 in SciRes. http://www.scirp.org/journal/cs http://dx.doi.org/1.4236/cs.216.78121 Area and Speed Efficient Implementation of Symmetric FIR

More information

Design and Implementation of LUT Optimization DSP Techniques

Design and Implementation of LUT Optimization DSP Techniques Design and Implementation of LUT Optimization DSP Techniques 1 D. Srinivasa rao & 2 C. Amala 1 M.Tech Research Scholar, Priyadarshini Institute of Technology & Science, Chintalapudi 2 Associate Professor,

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

Distributed Arithmetic Unit Design for Fir Filter

Distributed Arithmetic Unit Design for Fir Filter Distributed Arithmetic Unit Design for Fir Filter ABSTRACT: In this paper different distributed Arithmetic (DA) architectures are proposed for Finite Impulse Response (FIR) filter. FIR filter is the main

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

An MFA Binary Counter for Low Power Application

An MFA Binary Counter for Low Power Application Volume 118 No. 20 2018, 4947-4954 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu An MFA Binary Counter for Low Power Application Sneha P Department of ECE PSNA CET, Dindigul, India

More information

A Parallel Area Delay Efficient Interpolation Filter Architecture

A Parallel Area Delay Efficient Interpolation Filter Architecture A Parallel Area Delay Efficient Interpolation Filter Architecture [1] Anusha Ajayan, [2] Rafeekha M J [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology,

More information

Design of Carry Select Adder using Binary to Excess-3 Converter in VHDL

Design of Carry Select Adder using Binary to Excess-3 Converter in VHDL Journal From the SelectedWorks of Kirat Pal Singh Summer May 18, 2016 Design of Carry Select Adder using Binary to Excess-3 Converter in VHDL Brijesh Kumar, Vaagdevi college of engg. Pune, Andra Pradesh,

More information

FPGA Hardware Resource Specific Optimal Design for FIR Filters

FPGA Hardware Resource Specific Optimal Design for FIR Filters International Journal of Computer Engineering and Information Technology VOL. 8, NO. 11, November 2016, 203 207 Available online at: www.ijceit.org E-ISSN 2412-8856 (Online) FPGA Hardware Resource Specific

More information

Implementation of Low Power and Area Efficient Carry Select Adder

Implementation of Low Power and Area Efficient Carry Select Adder International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 3 Issue 8 ǁ August 2014 ǁ PP.36-48 Implementation of Low Power and Area Efficient Carry Select

More information

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA) Research Journal of Applied Sciences, Engineering and Technology 12(1): 43-51, 2016 DOI:10.19026/rjaset.12.2302 ISSN: 2040-7459; e-issn: 2040-7467 2016 Maxwell Scientific Publication Corp. Submitted: August

More information

N.S.N College of Engineering and Technology, Karur

N.S.N College of Engineering and Technology, Karur Modified Reconfigurable CSD Fir Filter Design Using Look up Table Sivakumar.M 1, Ranjitha.S 2, Vijayabharathi.P 3, Dhivya.G 4 1 Assistant professor, 2,3,4 UG student-final year, Department of Electronics

More information

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA Volume-6, Issue-3, May-June 2016 International Journal of Engineering and Management Research Page Number: 753-757 Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA Anshu

More information

An Efficient High Speed Wallace Tree Multiplier

An Efficient High Speed Wallace Tree Multiplier Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace

More information

Designing Fir Filter Using Modified Look up Table Multiplier

Designing Fir Filter Using Modified Look up Table Multiplier Designing Fir Filter Using Modified Look up Table Multiplier T. Ranjith Kumar Scholar, M-Tech (VLSI) GITAM University, Visakhapatnam Email id:-ranjithkmr55@gmail.com ABSTRACT- With the advancement in device

More information

VLSI IEEE Projects Titles LeMeniz Infotech

VLSI IEEE Projects Titles LeMeniz Infotech VLSI IEEE Projects Titles -2019 LeMeniz Infotech 36, 100 feet Road, Natesan Nagar(Near Indira Gandhi Statue and Next to Fish-O-Fish), Pondicherry-605 005 Web : www.ieeemaster.com / www.lemenizinfotech.com

More information

Memory Based Computing for DSP. Pramod Meher Institute for Infocomm Research

Memory Based Computing for DSP. Pramod Meher Institute for Infocomm Research Memory Based Computing for DSP Applications Pramod Meher Institute for Infocomm Research Singapore outline trends in memory technology memory based computing: advantages and examples DA based computation

More information

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application K Allipeera, M.Tech Student & S Ahmed Basha, Assitant Professor Department of Electronics & Communication Engineering

More information

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA Ch. Pavan kumar #1, V.Narayana Reddy, *2, R.Sravanthi *3 #Dept. of ECE, PBR VIT, Kavali, A.P, India #2 Associate.Proffesor, Department

More information

An FPGA Implementation of Shift Register Using Pulsed Latches

An FPGA Implementation of Shift Register Using Pulsed Latches An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,

More information

Low Power Area Efficient Parallel Counter Architecture

Low Power Area Efficient Parallel Counter Architecture Low Power Area Efficient Parallel Counter Architecture Lekshmi Aravind M-Tech Student, Dept. of ECE, Mangalam College of Engineering, Kottayam, India Abstract: Counters are specialized registers and is

More information

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE S.Basi Reddy* 1, K.Sreenivasa Rao 2 1 M.Tech Student, VLSI System Design, Annamacharya Institute of Technology & Sciences (Autonomous), Rajampet (A.P),

More information

MODULE 3. Combinational & Sequential logic

MODULE 3. Combinational & Sequential logic MODULE 3 Combinational & Sequential logic Combinational Logic Introduction Logic circuit may be classified into two categories. Combinational logic circuits 2. Sequential logic circuits A combinational

More information

A VLSI Architecture for Variable Block Size Video Motion Estimation

A VLSI Architecture for Variable Block Size Video Motion Estimation A VLSI Architecture for Variable Block Size Video Motion Estimation Yap, S. Y., & McCanny, J. (2004). A VLSI Architecture for Variable Block Size Video Motion Estimation. IEEE Transactions on Circuits

More information

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler Efficient Architecture for Flexible Using Multimodulo G SWETHA, S YUVARAJ Abstract This paper, An Efficient Architecture for Flexible Using Multimodulo is an architecture which is designed from the proposed

More information

A Low Power Delay Buffer Using Gated Driver Tree

A Low Power Delay Buffer Using Gated Driver Tree IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) ISSN: 2319 4200, ISBN No. : 2319 4197 Volume 1, Issue 4 (Nov. - Dec. 2012), PP 26-30 A Low Power Delay Buffer Using Gated Driver Tree Kokkilagadda

More information

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,

More information

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3 A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3 #1 Electronics & Communication, RTMNU. *2 Electronics & Telecommunication, RTMNU. #3 Electronics & Telecommunication,

More information

Implementation of High Speed Adder using DLATCH

Implementation of High Speed Adder using DLATCH International Journal of Emerging Engineering Research and Technology Volume 3, Issue 12, December 2015, PP 162-172 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Implementation of High Speed Adder using

More information

Efficient Implementation of Multi Stage SQRT Carry Select Adder

Efficient Implementation of Multi Stage SQRT Carry Select Adder International Journal of Research Studies in Science, Engineering and Technology Volume 2, Issue 8, August 2015, PP 31-36 ISSN 2349-4751 (Print) & ISSN 2349-476X (Online) Efficient Implementation of Multi

More information

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com IMPLEMENTATION OF FAST SQUARE ROOT SELECT WITH LOW POWER CONSUMPTION V.Elanangai*, Dr. K.Vasanth Department of

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal

International Journal of Engineering Research-Online A Peer Reviewed International Journal RESEARCH ARTICLE ISSN: 2321-7758 VLSI IMPLEMENTATION OF SERIES INTEGRATOR COMPOSITE FILTERS FOR SIGNAL PROCESSING MURALI KRISHNA BATHULA Research scholar, ECE Department, UCEK, JNTU Kakinada ABSTRACT The

More information

ISSN:

ISSN: 427 AN EFFICIENT 64-BIT CARRY SELECT ADDER WITH REDUCED AREA APPLICATION CH PALLAVI 1, VSWATHI 2 1 II MTech, Chadalawada Ramanamma Engg College, Tirupati 2 Assistant Professor, DeptofECE, CREC, Tirupati

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

Optimizing area of local routing network by reconfiguring look up tables (LUTs) Vol.2, Issue.3, May-June 2012 pp-816-823 ISSN: 2249-6645 Optimizing area of local routing network by reconfiguring look up tables (LUTs) Sathyabhama.B 1 and S.Sudha 2 1 M.E-VLSI Design 2 Dept of ECE Easwari

More information

Designing an Efficient and Secured LUT Approach for Area Based Occupations

Designing an Efficient and Secured LUT Approach for Area Based Occupations Designing an Efficient and Secured LUT Approach for Area Based Occupations 1 D. Jahnavi, 2 Y. Ravikiran varma 1 M.Tech scholar, E.C.E, Sreenivasa institute of technology and management studies, Chittoor

More information

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 03, 2015 ISSN (online): 2321-0613 V Priya 1 M Parimaladevi 2 1 Master of Engineering 2 Assistant Professor 1,2 Department

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 03 February 09, 2012 Dohn Bowden 1 Today s Lecture Registers and Counters Chapter 12 2 Course Admin 3 Administrative Admin for tonight Syllabus

More information

A Fast Constant Coefficient Multiplier for the XC6200

A Fast Constant Coefficient Multiplier for the XC6200 A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx

More information

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits N.Brindha, A.Kaleel Rahuman ABSTRACT: Auto scan, a design for testability (DFT) technique for synchronous sequential circuits.

More information

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters Logic and Computer Design Fundamentals Chapter 7 Registers and Counters Registers Register a collection of binary storage elements In theory, a register is sequential logic which can be defined by a state

More information

Research Article Low Power 256-bit Modified Carry Select Adder

Research Article Low Power 256-bit Modified Carry Select Adder Research Journal of Applied Sciences, Engineering and Technology 8(10): 1212-1216, 2014 DOI:10.19026/rjaset.8.1086 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library

More information

High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider

High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider High Speed 8-bit Counters using State Excitation Logic and their Application in Frequency Divider Ranjith Ram. A 1, Pramod. P 2 1 Department of Electronics and Communication Engineering Government College

More information

K. Phanindra M.Tech (ES) KITS, Khammam, India

K. Phanindra M.Tech (ES) KITS, Khammam, India Volume 7, Issue 5, May 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com LUT Optimization

More information

Figure 1.LFSR Architecture ( ) Table 1. Shows the operation for x 3 +x+1 polynomial.

Figure 1.LFSR Architecture ( ) Table 1. Shows the operation for x 3 +x+1 polynomial. High-speed Parallel Architecture and Pipelining for LFSR Vinod Mukati PG (M.TECH. VLSI engineering) student, SGVU Jaipur (Rajasthan). Vinodmukati9@gmail.com Abstract Linear feedback shift register plays

More information

FPGA Realization of High Speed FIR Filter based on Distributed Arithmetic

FPGA Realization of High Speed FIR Filter based on Distributed Arithmetic KGShanthi et al / International Journal of Engineering and Technology (IJET) FPGA Realization of High Speed FIR Filter ased on istriuted Arithmetic KGShanthi #1, rnnagarajan *2, CKalieswari #3 # epartment

More information

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online:

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online: LOW POWER SHIFT REGISTERS USING CLOCK GATING TECHNIQUE #1 G.SHIREESHA, M.Tech student, #2 T.NAGESWARRAO, Assistant Professor, #3 S.NAGESWARA RAO, Assistant Professor, Dept of ECE, SRI VENKATESWARA ENGINEERING

More information

Flip Flop. S-R Flip Flop. Sequential Circuits. Block diagram. Prepared by:- Anwar Bari

Flip Flop. S-R Flip Flop. Sequential Circuits. Block diagram. Prepared by:- Anwar Bari Sequential Circuits The combinational circuit does not use any memory. Hence the previous state of input does not have any effect on the present state of the circuit. But sequential circuit has memory

More information

Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder

Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder Operating Bio-Implantable Devices in Ultra-Low Power Error Correction Circuits: using optimized ACS Viterbi decoder Roshini R, Udhaya Kumar C, Muthumani D Abstract Although many different low-power Error

More information

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)

More information

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY 128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY 1 Mrs.K.K. Varalaxmi, M.Tech, Assoc. Professor, ECE Department, 1varuhello@Gmail.Com 2 Shaik Shamshad

More information

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH 1 Kalaivani.S, 2 Sathyabama.R 1 PG Scholar, 2 Professor/HOD Department of ECE, Government College of Technology Coimbatore,

More information

FPGA Implementation of DA Algritm for Fir Filter

FPGA Implementation of DA Algritm for Fir Filter International Journal of Computational Engineering Research Vol, 03 Issue, 8 FPGA Implementation of DA Algritm for Fir Filter 1, Solmanraju Putta, 2, J Kishore, 3, P. Suresh 1, M.Tech student,assoc. Prof.,Professor

More information

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20 Advanced Devices Using a combination of gates and flip-flops, we can construct more sophisticated logical devices. These devices, while more complex, are still considered fundamental to basic logic design.

More information

Retiming Sequential Circuits for Low Power

Retiming Sequential Circuits for Low Power Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching

More information

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Clock Gating Aware Low Power ALU Design and Implementation on FPGA Clock Gating Aware Low ALU Design and Implementation on FPGA Bishwajeet Pandey and Manisha Pattanaik Abstract This paper deals with the design and implementation of a Clock Gating Aware Low Arithmetic

More information

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL Random Access Scan Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL ramamve@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract: Random Access

More information

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55)

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55) Previous Lecture Sequential Circuits Digital VLSI System Design Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture No 7 Sequential Circuit Design Slide

More information

An Efficient Viterbi Decoder Architecture

An Efficient Viterbi Decoder Architecture IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume, Issue 3 (May. Jun. 013), PP 46-50 e-issn: 319 400, p-issn No. : 319 4197 An Efficient Viterbi Decoder Architecture Kalpana. R 1, Arulanantham.

More information

Design on CIC interpolator in Model Simulator

Design on CIC interpolator in Model Simulator Design on CIC interpolator in Model Simulator Manjunathachari k.b 1, Divya Prabha 2, Dr. M Z Kurian 3 M.Tech [VLSI], Sri Siddhartha Institute of Technology, Tumkur, Karnataka, India 1 Asst. Professor,

More information

Power Reduction Techniques for a Spread Spectrum Based Correlator

Power Reduction Techniques for a Spread Spectrum Based Correlator Power Reduction Techniques for a Spread Spectrum Based Correlator David Garrett (garrett@virginia.edu) and Mircea Stan (mircea@virginia.edu) Center for Semicustom Integrated Systems University of Virginia

More information

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

An Improved Recursive and Non-recursive Comb Filter for DSP Applications eonode Inc From the SelectedWorks of Dr. oita Teymouradeh, CEng. 2006 An Improved ecursive and on-recursive Comb Filter for DSP Applications oita Teymouradeh Masuri Othman Available at: https://works.bepress.com/roita_teymouradeh/4/

More information

Available online at ScienceDirect. Procedia Technology 24 (2016 )

Available online at   ScienceDirect. Procedia Technology 24 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1155 1162 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST 2015) FPGA Implementation

More information

VLSI System Testing. BIST Motivation

VLSI System Testing. BIST Motivation ECE 538 VLSI System Testing Krish Chakrabarty Built-In Self-Test (BIST): ECE 538 Krish Chakrabarty BIST Motivation Useful for field test and diagnosis (less expensive than a local automatic test equipment)

More information

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop Sumant Kumar et al. 2016, Volume 4 Issue 1 ISSN (Online): 2348-4098 ISSN (Print): 2395-4752 International Journal of Science, Engineering and Technology An Open Access Journal Improve Performance of Low-Power

More information

Fully Pipelined High Speed SB and MC of AES Based on FPGA

Fully Pipelined High Speed SB and MC of AES Based on FPGA Fully Pipelined High Speed SB and MC of AES Based on FPGA S.Sankar Ganesh #1, J.Jean Jenifer Nesam 2 1 Assistant.Professor,VIT University Tamil Nadu,India. 1 s.sankarganesh@vit.ac.in 2 jeanjenifer@rediffmail.com

More information

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters IOSR Journal of Mechanical and Civil Engineering (IOSR-JMCE) e-issn: 2278-1684, p-issn: 2320-334X Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters N.Dilip

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Introductory Digital Systems Lab (6.111) Quiz #2 - Spring 2003 Prof. Anantha Chandrakasan and Prof. Don

More information

THE CAPABILITY to display a large number of gray

THE CAPABILITY to display a large number of gray 292 JOURNAL OF DISPLAY TECHNOLOGY, VOL. 2, NO. 3, SEPTEMBER 2006 Integer Wavelets for Displaying Gray Shades in RMS Responding Displays T. N. Ruckmongathan, U. Manasa, R. Nethravathi, and A. R. Shashidhara

More information

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 10 Oct. 2016, Page No. 18532-18540 Pulsed Latches Methodology to Attain Reduced Power and Area Based

More information

Chapter 4. Logic Design

Chapter 4. Logic Design Chapter 4 Logic Design 4.1 Introduction. In previous Chapter we studied gates and combinational circuits, which made by gates (AND, OR, NOT etc.). That can be represented by circuit diagram, truth table

More information

L12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures L12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Frank Honore Prof. Randy Katz (Unified Microelectronics

More information

Design and Analysis of Modified Fast Compressors for MAC Unit

Design and Analysis of Modified Fast Compressors for MAC Unit Design and Analysis of Modified Fast Compressors for MAC Unit Anusree T U 1, Bonifus P L 2 1 PG Student & Dept. of ECE & Rajagiri School of Engineering & Technology 2 Assistant Professor & Dept. of ECE

More information

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency Journal From the SelectedWorks of Journal December, 2014 An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency P. Manga

More information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.

More information

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder Muralidharan.R [1], Jodhi Mohana Monica [2], Meenakshi.R [3], Lokeshwaran.R [4] B.Tech Student, Department of Electronics

More information

Chapter 3 Unit Combinational

Chapter 3 Unit Combinational EE 200: Digital Logic Circuit Design Dr Radwan E Abdel-Aal, COE Logic and Computer Design Fundamentals Chapter 3 Unit Combinational 5 Registers Logic and Design Counters Part Implementation Technology

More information