Distributed Arithmetic Unit Design for Fir Filter

Similar documents
Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

An Lut Adaptive Filter Using DA

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

FPGA Implementation of DA Algritm for Fir Filter

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

FPGA Hardware Resource Specific Optimal Design for FIR Filters

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Memory efficient Distributed architecture LUT Design using Unified Architecture

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Reconfigurable Fir Digital Filter Realization on FPGA

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

ISSN:

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

Implementation of High Speed Adder using DLATCH

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

An Efficient Reduction of Area in Multistandard Transform Core

DDC and DUC Filters in SDR platforms

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Design of Memory Based Implementation Using LUT Multiplier

LUT Optimization for Memory Based Computation using Modified OMS Technique

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Improved 32 bit carry select adder for low area and low power

THE USE OF forward error correction (FEC) in optical networks

Research Article Low Power 256-bit Modified Carry Select Adder

VLSI IEEE Projects Titles LeMeniz Infotech

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

Design of Modified Carry Select Adder for Addition of More Than Two Numbers

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

An Efficient High Speed Wallace Tree Multiplier

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Serial FIR Filter. A Brief Study in DSP. ECE448 Spring 2011 Tuesday Section 15 points 3/8/2011 GEORGE MASON UNIVERSITY.

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

A Fast Constant Coefficient Multiplier for the XC6200

Design on CIC interpolator in Model Simulator

ALONG with the progressive device scaling, semiconductor

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

A Parallel Area Delay Efficient Interpolation Filter Architecture

An MFA Binary Counter for Low Power Application

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

A Novel Architecture of LUT Design Optimization for DSP Applications

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

An Enhancement of Decimation Process using Fast Cascaded Integrator Comb (CIC) Filter

Design and Analysis of Modified Fast Compressors for MAC Unit

Implementation of Low Power and Area Efficient Carry Select Adder

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

University of Maiduguri Faculty of Engineering Seminar Series Volume 6, december 2015

Fully Pipelined High Speed SB and MC of AES Based on FPGA

Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration

Design and Implementation of LUT Optimization DSP Techniques

L12: Reconfigurable Logic Architectures

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Designing Fir Filter Using Modified Look up Table Multiplier

Design of Fault Coverage Test Pattern Generator Using LFSR

Design of an Area-Efficient Interpolated FIR Filter Based on LUT Partitioning

Hardware Implementation of Viterbi Decoder for Wireless Applications

DESIGN OF LOW POWER AND HIGH SPEED BEC 2248 EFFICIENT NOVEL CARRY SELECT ADDER

FPGA Realization of Farrow Structure for Sampling Rate Change

Efficient Implementation of Multi Stage SQRT Carry Select Adder

Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL

A Low Power VLSI Implementation of Reconfigurable FIR Filter Using Carry Bypass Adder

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

FPGA Design with VHDL

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

CHAPTER 4 RESULTS & DISCUSSION

Optimization of memory based multiplication for LUT

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Microprocessor Design

Pak. J. Biotechnol. Vol. 14 (Special Issue II) Pp (2017) Parjoona V. and P. Manimegalai

Journal of Theoretical and Applied Information Technology 20 th July Vol. 65 No JATIT & LLS. All rights reserved.

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Design & Simulation of 128x Interpolator Filter

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Design of BIST with Low Power Test Pattern Generator

DESIGN OF HIGH PERFORMANCE, AREA EFFICIENT FIR FILTER USING CARRY SELECT ADDER

Modified128 bit CSLA For Effective Area and Speed

An Efficient Carry Select Adder

AbhijeetKhandale. H R Bhagyalakshmi

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Design of Low Power Efficient Viterbi Decoder

FPGA Digital Signal Processing. Derek Kozel July 15, 2017

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online:

Field Programmable Gate Arrays (FPGAs)

Logic Design II (17.342) Spring Lecture Outline

FPGA Implementation of Optimized Decimation Filter for Wireless Communication Receivers

L11/12: Reconfigurable Logic Architectures

Memory Based Computing for DSP. Pramod Meher Institute for Infocomm Research

OPTIMIZED DIGITAL FILTER ARCHITECTURES FOR MULTI-STANDARD RF TRANSCEIVERS

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Transcription:

Distributed Arithmetic Unit Design for Fir Filter ABSTRACT: In this paper different distributed Arithmetic (DA) architectures are proposed for Finite Impulse Response (FIR) filter. FIR filter is the main part of the Digital Signal Processing. In Digital Signal Processing we can use Multiply Accumulator Circuit (MAC) and DA for filter design.mac consumes more power and area because of multiplier and adder circuit. The design distributed arithmetic is run time reconfigurable. The implementation results are provided to demonstrate a high speed and low power architectures. The different DA architecture are implemented in verilog and verified via simulation. In the 16-tap FIR filter design of distributed arithmetic gives better results, 50% of power dissipation and area can be achieved by LUT less2 architecture. 49% of delay can be achieved by separated LUT DA architecture. Keywords: Distributed Arithmetic (DA), Finite impulse response (FIR) Filter, Multiply-Accumulate-Circuit (MAC). Introduction: B. Ayyappa Reddy M.Tech Student, MITS-Madanapalle. In the recent years, there was a developing tendency in order to implement a digital signal processing functions in Field Programmable gate array (FPGA). Finite impulse Response (FIR) filters are most frequent a digital signal processing system unit. FIR filter with exactly linear phase can easily be design. It can be realized in both recursive and non-recursive structure. Generally, direct implementation of an N-tap FIR filter requires multiply and accumulate (MAC) blocks, which are extravagant to implement in FPGA because of resource usage and logic complexity [1]. To determination this issue, first present Distributed Arithmetic, which may be multiplier less architecture? Implementing multipliers while using the reasoning materials from the FPGA will be high-priced because of logic complexity as well as area use, especially when the filter size will be large. G. Sambasiva Rao Asst.prof, MITS-Madanapalle. Modern FPGAs have got focused DSP blocks which relieve this concern, but also for substantial filter sizes the battle associated with decreasing area as well as complexity even now remains. Distributed Arithmetic was introduced by croisier in FIR filters to overcome the difficulties of MAC. DA is a multiplier less architecture based on 2 s complement binary representation of data which will pre compute and stored in LUT and bit position reordering [2]. Distributed Arithmetic implementation can be classified in to two ways. Those are RAM-based and ROM-based methods. The 2 multiplier-less techniques are conversion based approach and memories (RAMs, ROMs) or Look-Up table (LUT) techniques. The LUTs are used to store pre-computed values of coefficient operations [1]. Pre-computation and pre-calculation values are the states stored in LUT in ROM-based which have an impact of low power design. These types of memory based structures are hugely useful in power usage which often uses area in expense of pre-defined in addition to set filter coefficients therefore limits the application scopes. RAM based scheme is usually an substitute way of implement the particular FIR filter. Inside RAM model set filter coefficients usually are stashed because articles which allows adjusting in addition to changing the particular coefficients through the runtime of the filter for several applications. Power consumption and area are the major motivation factors for researchers when compared to ROM-based design. In the FIR filter design and performance measures three basic boundaries are there. Those are Speed or runtime clock frequency, power as well as area. Power and area improved in the DA comparing to MAC unit. This accomplishment is likewise focused with reconfigurable or adaptable filter configuration to have both of the ROM-based execution and RAM based adaptability. The proposed DA architecture endeavors the circuit exchanging action on the most dynamic and power hungry units. www.ijmetmr.com Page 7

II. Background Concept: A. Multiply and Accumulate: The MAC operation is common in Digital Signal Processing Algorithms. In Digital Signal Processing MAC is the one major unit to design the filter. The MAC unit computes the multiplication of 2 numbers and adds that product to an accumulator [1]. p p + (q r) (1) The above equation (1) symbolizes the MAC numerical function. The place where p symbolizes the out of accumulator, q is the input and r is the coefficient. B. Distributed Arithmetic: Distributed Arithmetic is the extension of multiply and accumulate unit (MAC). It is efficient technique for calculation of inner product or sum of products or multiplies and accumulates. Distributed Arithmetic is a technique that is bit serial in nature. Efficiency of mechanization is the advantage of Distributed Arithmetic (DA). The above equation (5) describes a DA computation. Consider the bracketed term _(i=1)^ka_i b_in, due to the fact every single trash can will take the actual values involving 0 in addition to 1 only, consequently only 2i achievable values tend to be major. We can easily calculate these types of values on-line (using any RAM), as well as pre-compute the actual values in addition to shop them in the ROM. That input details needs to be used to directly handle the memory and also the output result. Immediately after N like series, the memory affects the output result [3]. C. FIR Filter Implementation: By using MAC as well as DA units we can implement FIR Filter. Involving of which DA is just about the nearly all recognized methods. The K-length FIR filter can be represented as: In which x[k] may be the input information [1, 4] as well as h[k] may be the filter coefficient. Generally, direct implementation of the K-tap FIR filter requires K MAC blocks that s proven within Fig. 1 [1]. For sum of product the general equation is: Fig.1.Block diagram of Conventional tapped FIR Filter The actual down below Fig. 2, one more implementation involving FIR filter will be based upon DA approach. The actual DA architecture consists of 3 parts. DA-LUT, shift register and adder/shifter. The filter coefficients pre-stored and addressed by input data in DA-LUT [1]. www.ijmetmr.com Page 8

Simply by use of combinational logic circuit the filter efficiency will be damaged. Fig.4. represents that the LUT-Less2 DA architecture for 4-tap FIR Filter. Fig. 2. Original LUT based DA representation of 4-tap FIR filter. III. Proposed Distributed Arithmetic Unit Design: A.LUT-Less1 DA Architecture: In Fig. 2. We can easily see that the lower half of the particular LUT could be the similar using the sum the upper connected with LUT and h[3].by using a 2:1 Multiplexer and an adder can be reduced to half of DA-LUT unit [1, 2]. To reduce the delay carry save adder is replaced with carry look ahead adder. Fig.4. LUT- Less2 DA Architecture for 4-tap FIR filter C. Separated-LUT DA Architecture: As the filter size improves the components setup cost of memory in DA architecture develops exponentially. We can break down this k-map FIR straight into N small FIR filters. Hence LUT size reduced to N 2m words. The below figure shows the design of 4-tap FIR filter for separated LUT-DA architecture. Fig.5.Separated-LUT DA Architecture for 4-tap FIR filter Fig.3.LUT- Less1 DA Architecture for 4-tap FIR filter B.LUT-Less2 DA Architecture: From the same LUT decrease process, we have got LUT-Less2 DA architecture. The LUT Less2 DA structures dramatically reduce the actual memory. In this architecture every one of the LUT models are generally replaced simply by mux and adders. IV. Synthesis Results: In order to compare the performance of the various LUT-DA architecture for FIR filter design are described in section 3 Mainly this filter code is usually wrote inside veriloghdl regarding each of the architectures and then synthesize is conducted with cadence design compiler with this purposed this 4-tap, 8-tap&16-tap FIR filter with conventional DA, LUT-Less1, LUT-Less2 and separated-lut are implemented. www.ijmetmr.com Page 9

with cadence design compiler with this purposed this 4-tap, 8-tap&16-tap FIR filter with conventional DA, LUT-Less1, LUT-Less2 and separated-lut are implemented. Table 1. Comparisons of power dissipation in mw Fig.6. Power Report Table 2. Comparison of Delay in ps B.Delay Mechanism: The result of delay is shown in below figure. From the Synthesis report delay has reduced in separated LUT DA based architecture. Table 3. Comparison of Area in µm2 C.Area Report: Fig.7. Delay Mechanism The result of area report is shown in fig.8.from the synthesis report area is reduced in LUT Less2 architecture compared with conventional DA based architecture. Fig. 8. Area Report The below table shows the comparison between the 4-Tap, 8-Tap and 16-Tap architecture of FIR filter. Table 1. Comparisons of power dissipation in mw V. Conclusion: MAC and DA are commonly used in digital signal processing and filter design. Different DALUT architectures design for FIR filters. These three architectures reduce in different aspects such as power, delay and area. Thus to reduce LUT Size higher order filters divided into several group of small filters. The design of distributed arithmetic unit has the run-time coefficient configurability. The target architecture is design, verified and simulated with verilog HDL for power, delay and area target architecture is synthesize in cadence digital lab. In the 16-tap FIR filter design of our distributed arithmetic gives the better results, LUT-Less2 architecture power dissipation and area improvements are 50%.and separate LUT DA architecture gives 49% of delay reduction. www.ijmetmr.com Page 10

VI.REFRENCES: [1] Wang Sen, Tang Bin, Zhu Jun, Distributed Arithmetic for FIR Filter Design on FPGA International conference on communication. Circuits and systems, October 2007, pp.620-623 [2] AM. AL-Haj, An FPGA-Based Parallel Distributed Arithmetic Implementation of the 1-D Discrete Wavelet Transform, vol 29, pp. 241-247, February 2004 [3] N. S. Pal, H.P. Singh, P. I. Sarin, S. Singh, Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm, International Journal of Computer Applications, July 2011 vol. 25, no. 7, pp. 26-32. [4] S. F. Ghamkhari, M. B. Ghaznavi-Ghoushchi A Low- Power Low-Area Architecture Design for Distributed Arithmetic (DA) Unit, 20th Iranian Conference on Electrical Engineering, (ICEE2012), May 15-17, 2012, Tehran, Iran. [5] LI Nian-giang, Hou Si-Yu Cui Shi-Yao, Application of Distributed FIR filter based on FPGA in the analyzing of ECG signal international conference on intelligent system design and engineering application,(2010ieee). october,11,2009 [6] D.J. Allred, H. Yoo, V. Krishnan, W. Huang, and D.V. Anderson, LMS adaptive filters using distributed arithmetic for high 237 throughput, IEEE Transactions on Circuits and Systems, vol. 52, no. 7,pp. 1327-1337,2005. www.ijmetmr.com Page 11