Critical Path Reduction of Distributed Arithmetic Based FIR Filter

Similar documents
LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Quantization of Three-Bit Logic for LDPC Decoding

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Error Concealment Aware Rate Shaping for Wireless Video Transport 1

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

A Quantization-Friendly Separable Convolution for MobileNets

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

A Scalable HDD Video Recording Solution Using A Real-time File System

A Comparative Analysis of Disk Scheduling Policies

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

Simple VBR Harmonic Broadcasting (SVHB)

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

Following a musical performance from a partially specified score.

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

Optimized PMU placement by combining topological approach and system dynamics aspects

QUICK START GUIDE v0.98

Product Information. Manual change system HWS

Technical Information

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

Product Information. Manual change system HWS

Why Take Notes? Use the Whiteboard Capture System

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

Analysis of Subscription Demand for Pay-TV

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

current activity shows on the top right corner in green. The steps appear in yellow

T541 Flat Panel Monitor User Guide ENGLISH

Product Information. Miniature rotary unit ERD

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

User s manual. Digital control relay SVA

Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

Statistics AGAIN? Descriptives

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

Anchor Box Optimization for Object Detection

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

A STUDY OF TRUMPET ENVELOPES

LUT Optimization for Memory Based Computation using Modified OMS Technique

Novel Quantization Strategies for Linear Prediction with Guarantees

Study on the location of building evacuation indicators based on eye tracking

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation. Get the total Coverage!

Color Monitor. L200p. English. User s Guide

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

Modeling Form for On-line Following of Musical Performances

Product Information. Universal swivel units SRU-plus

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

Modular Plug Connectors (Standard and Small Conductor)

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

zenith Installation and Operating Guide HodelNumber I Z42PQ20 [ PLASHATV

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

ALONG with the progressive device scaling, semiconductor

Environmental Reviews. Cause-effect analysis for sustainable development policy

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Design of Memory Based Implementation Using LUT Multiplier

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation

Optimization of memory based multiplication for LUT

Product Information. Universal swivel units SRU-plus 25

An Efficient Reduction of Area in Multistandard Transform Core

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Scalable QoS-Aware Disk-Scheduling

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

arxiv: v1 [cs.cl] 12 Sep 2018

Memory efficient Distributed architecture LUT Design using Unified Architecture

INTERCOM SMART VIDEO DOORBELL. Installation & Configuration Guide

www. ElectricalPartManuals. com l Basler Electric VOLTAGE REGULATOR FEATURES: CLASS 300 EQUIPMENT AVC63 4 FEATURES AND APPLICATIONS

THE SIMULATION OF TRANSPORT DELAY WITH THE HYDAC* COMPUTING SYSTEM

Designing Fir Filter Using Modified Look up Table Multiplier

Loewe bild 5.55 oled. Modular Design Flexible configuration with individual components. Set-up options. TV Monitor

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

Sealed Circular LC Connector System Plug

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

An Lut Adaptive Filter Using DA

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Printer Specifications

MC6845P I 1.5. ]Vs ,.~

SWS 160. Moment loading. Technical data. M x max Nm M y max Nm. M z max Nm

Implementation of Memory Based Multiplication Using Micro wind Software

User guide. Receiver-In-The-Ear hearing aids, rechargeable Hearing aid charger. resound.com

Transcription:

Crtcal Path Reducton of strbuted rthmetc Based FIR Flter Sunta Badave epartment of Electrcal and Electroncs Engneerng.I.T, urangabad aharashtra, Inda njal Bhalchandra epartment of Electroncs and Telecommuncaton Engneerng, G.E.C. urangabad, aharashtra, Inda bstract Operatng speed, whch s recprocal of crtcal path computaton tme, s one of the promnent desgn matrces of fnte mpulse response (FIR) flters. It s largely affected by both, system archtecture as well as technque used to desgn arthmetc modules. large computaton tme of multplers n conventonally desgned multplers, lmts the speed of system archtecture. strbuted arthmetc s one of the technques, used to provde multpler-free multplcaton n the mplementaton of FIR flter. However suffers from a sever lmtaton of exponental growth of look up table (LUT) wth order of flter. n mproved dstrbuted arthmetc technque s addressed here to desgn for system archtecture of FIR flter. In proposed technque, a sngle large LUT of conventonal s replaced by number of smaller ndexed LUT pages to restrct exponental growth and to reduce system access tme. It also elmnates the use of adders. Selecton module selects the desred value from desred page, whch leads to reduce computatonal tme of crtcal path. Trade off between access tmes of LUT pages and selecton module helps to acheve mnmum crtcal path so as to maxmze the operatng speed. Implementatons are targeted to Xlnx ISE, Vrtex IV devces. FIR flter wth 8 bt data wdth of nput sample results are presented here. It s observed that, proposed desgn perform sgnfcantly faster as compared to the conventonal and exstng based desgns. Keywords Crtcal Path; ultpler less FIR flter; strbuted rthmetc; LUT esgn; Indexed LUT I. INTROUCTION gtal Sgnal Processng (SP) systems are generally mplemented usng sequental crcuts, where numbers of arthmetc modules n the longest path between any two storage elements are members of crtcal path. The Crtcal Path Computaton Tme (CPCT) determnes the mnmum feasble clock perod and hence maxmum allowable operatng frequency of SP system. Fnte mpulse response (FIR) dgtal flter s one of the wdely used Lnear Tme Invarant (LTI) systems, has ganed popularty n the feld of dgtal sgnal processng due to ts stablty, lnearty and ease of mplementaton. However, attenton need to pay specfcally whle desgnng the hgh speed FIR flter, as CPCT s affected by both, system archtecture as well as technques used to desgn arthmetc modules. For such crtcal desgn of system archtecture, fxed structure offered by gtal sgnal processor s not sutable. However, hgh nonrecurrng engneerng (NRE) costs and long development tme for applcaton specfc ntegrated crcuts (SICs) are makng feld programmable gate arrays (FPGs) more attractve for applcaton specfc SP solutons. FPG also offers desgn flexblty to arthmetc modules then SICs. For an N th order FIR flter, each output sample s nner product of mpulse response and nput vector of latest N samples[1] gven n (1). N 1 Y( n) k0 k X nk For crtcal path mnmzaton, drect mplementaton of (1) s not a cost effectve soluton because of two reasons. Frst, crtcal path ncreases wth the order of flter and second, multpler s an expensve arthmetc module wth respect to area and computatonal tme. ore than two decade, many researchers [2-10] have worked on varous multplerless technques for FIR flter desgn. In case of constant coeffcent multplcaton, look-up-table (LUT) multplers [11-13] and dstrbuted arthmetc ()[14-24] are two memory based approaches found n FIR flter desgn. n mproved dstrbuted rthmetc technque s addressed here to desgn for system archtecture for FIR flter, as ts operatng speed s almost ndependent wth order of flter. In recent years strbuted rthmetc has ganed substantal popularty due to ts regular structure and hgh throughput capablty, whch results n cost-effectve and effcent computng structure. Ths technque was frst ntroduced by Croser [14] and further development was carred out by Peled [15] for effcent mplementaton of dgtal flters n ts seral form. part from ts several advantages; based structure s facng a serous lmtaton of exponental growth of memory wth order of flter. any researchers [16-27] have addressed ths problem, whle dealng wth ths ssues. Partal or full parallel structure wth two and more than two bts [16,25] has been exploted to overcome the speed lmtaton, nherent to bt seral structure. ttempts were also been made to reduce memory requrement by recastng nput data n Offset Bnary Codng(OBC)[16], modfed OBC and LUTless -OBC[19], nstead of normal bnary codng. Yoo and nderson [22] extended ths work and proposed a hardware effcent LUTless archtecture, whch gradually replaces LUT requrements wth multplexer/adder pars. However gan n area reducton s acheved at the cost of ncreased crtcal path over the conventonal desgn. LUT decomposton or slcng of LUT, proposed n [23], s one of the ways to restrct the exponental growth of memory. Though 71 P a g e

ths technque has elucdated a problem of exponental growth of memory, nvolves the fact that latency and access tme are the dependent parameters of level of decomposton. s the operatng speed of a flter s governed by worst case crtcal path, mproved technque s suggested n ths paper to ncrease the speed of operaton by reducng crtcal path. In proposed technque, a sngle large LUT of conventonal s replaced by number of smaller ndexed LUT pages to restrct exponental growth and to reduce system access tme. Indexng the LUT pages elmnates the use of adders of exstng technques [16,17,19,22-24]. Selecton module selects the desred value from desred page, and feed the value for further computaton. Trade off between access tmes of LUTs and selecton module helps to acheve mnmum crtcal path so as to maxmze the operatng speed. In organzaton of the paper, secton II elaborates lookup table concept of conventonal and proposed structures. Crtcal Path Computaton Tme (CPCT) analyss of prevous and proposed technques s gven n secton III. Secton IV presents the realzaton of proposed archtecture. Intally component level access tme analyss of proposed desgn s presented n secton V, followed by comparson of operatng frequency of proposed and prevous technques. Paper s ended wth concluson, n secton VI. II. CONVENTIONL ISTRIBUTE RITHETIC LGORITH FOR FIR IPLEENTTION strbuted rthmetc s one of the preferred methods of FIR flter mplementaton, as t elmnates the need of multpler, partcularly when multplcaton s wth constant coeffcents. By ths technque, sum-of-product terms n (1), can easly be transformed nto addton. Let B be the word length of nput samples, then, n an unsgned bnary form, X(n) can be represented as: B 1 X ( n) xn, 2 0 where x n, s the th bt of X(n). By Substtutng the value of X(n) from (2) nto (1), nner product can be expressed as: TBLE I. LUT address bts x 3 x 2 x 1 x 0 N 1 B 1 k xn k0 0 2 Y( n) CONVENTIONL LUT ESIGN LUT contents 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 + 0 1 1 1 1 3 + 2 + 1 + 0 Interchangng the sequence of summaton n (3), results nto: B1 N 1 k0 2 k x Y( n) 0 Further, compressed form of (4), can be expressed as: Where, X n {0,1} x B 1 Y( n) 2 0 x x 1 1, N2 N2, N 1 xn 1... 0 0,, Thus (5) creates 2 N possble values of γ. ll these values can therefore be precomputed and stored n form of look up table shown n table. I. The flterng operaton s performed by successvely accumulatng and shftng these precomputed values, based on the bt address formed by nput samples, X(n). method s proposed to choose desred sze of LUT for mnmum Crtcal Path Computaton Tme of LUT unt. Let N= (n+m); where n and m are arbtrary postve ntegers. sngle large LUT sze of 2 N, n conventonal desgn s converted nto 2 m LUT pages, each page wth 2 n memory locatons. pplyng ths concept to the (5), number of terms n γ can be dvded nto two groups: n LSB terms and m SB terms. It s represented by: 0 x x n n, 0,.. x 1 1, n m1x... nm1, n2 x n2, n1 n x n1, LSB n bts, defnes the sze of each LUT page, however, SB m bts defnes number of LUT pages. Instead of consstng coeffcent sum n conventonal look up table, proposed desgn LUT conssts of ndexed-sum-of-fltercoeffcents. TBLE II. n- LUT address bts PROPOSE LUT ESIGN LUT contents of each page x 3 x 2 x 1 x 0 0 0 0 0 I + 0 0 0 0 1 I + 0 0 0 1 0 I + 1 0 0 1 1 I + 1 + 0 0 1 0 0 I + 2 0 1 0 1 I + 2 + 0 0 1 1 0 I + 2 + 1 0 1 1 1 I + 2 + 1 + 0 1 0 0 0 I + 3 1 0 0 1 I + 3 + 0 1 0 1 0 I + 3 + 1 1 0 1 1 I + 3 + 1 + 0 1 1 0 0 I + 3 + 2 1 1 0 1 I + 3 + 2 + 0 1 1 1 0 I + 3 + 2 + 1 1 1 1 1 I + 3 + 2 + 1 + 0 Page number TBLE III. INEX TER FOR ECH LUT PGE m - ddress Bts x 5 x 4 0 0 0 0 1 0 1 4 2 1 0 5 3 1 1 5 + 4 Index terms I for LUT pages (6) 72 P a g e

Input Regster Bank (IJCS) Internatonal Journal of dvanced Computer Scence and pplcatons, page selector module selects desred output from one of the LUT pages, addressed by m bts. desred combnaton of n and m facltates to select the mnmum executon tme of LUT page and page selector module to attan maxmum operatng frequency. LUT page structure of 6 th order flter, for n=4 and m=2 and ndexed term of each page, s elaborated n table II and table III respectvely. Each LUT page contans summaton of flter coeffcents and ndex term I. III. CRITICL PTH COPUTTION TIE NLYSIS OF PROPOSE RCHITECTURE In ths secton, CPCT analyss [13] of conventonal [14-16], LUTless [19,22], slced [16,17,23,24]and proposed based FIR flter technques are elaborated. These desgns are taken nto consderaton as they are found more comparable wth proposed technque. Conventonal form of dstrbuted arthmetc FIR flter gven n fg.1 conssts of bank of nput regsters, LUT unt, and accumulator/shfter unt. part from these hardware unts, t needs control unt, whch defnes sequence of flter operaton. X(n) Input Regster LUT Structure ccumulator Y(n) 0 0 W 3 1 x 3 0 0 W 2 1 x 2 ccumulator 0 0 W 1 1 Shfter x 1 0 0 W 0 1 x 0 ultplexers dder Tree Fg. 3. LUTless based FIR flter x(n) (C) (Ca) T a T a Ta nodes = log S (Cas) Shfter Fg. 1. Functonal block dagram of conventonal based FIR flter Serally arrvng nput data values X(n) are stored n parallel form, n nput regster bank. Rght shft of t n every clock cycle; create a word, whch s used to address LUT. Successve shft and accumulaton of LUT outputs n B cycle gves Y(n). ata flow graph (FG) of conventonal based FIR flter, s as shown n fg.2. It conssts of nodes L as LUT, as accumulator and S as shfter. ccess tmes of L and are C L and C as respectvely, contrbutes n crtcal path. Thus CPCT of conventonal based FIR flter s expressed as: x(n) L (C L) CPCT (cnv) = C L + C as (7) S (C as) Fg. 2. ata flow graph of conventonal based FIR flter. LUTless based FIR flter Exponental growth of LUT s key ssue whle desgnng based FIR flter. Elmnaton of LUT s an attempt found n [13,24] to overcome exponental growth of LUT. In such LUTless structure, shown n fg.3, LUT s replaced by multplexer-adder par. On-lne data generated by multplexers are accumulated to create the flter output. FG of LUTless based FIR flter, shown n fg.4, conssts of multplexer node, adder nodes T a and nodes = order of flter Fg. 4. ata flow graph of LUTless based FIR flter accumulator node. Though the number of multplexers s governed by order of flter, access tme of only one multplexer contrbutes n CPCT, as they are operatng concurrently. ssumng the adders n adder tree are arranged n 4:2 form, access tme of log 2 (N) adders are taken nto consderaton whle calculatng CPCT of structure C a. It wll be expressed as: C a = log 2 N x T a (8) Thus C a s hghly flter order dependent as ndcated n (9). CPCT of structure becomes: CPCT (LUTless) = C +C a + C as (9) where C - access tme of multplexer. C a - access tme adder tree C as access tme of accumulator/shfter unt. B. Slced LUT based FIR flter nother well-known attempt found n [21,22,27] to restrct the exponental growth of LUT, s the use of multple memory banks. Latest, Longa and r [23], hghlghted that, FIR flter structure wll be an area effcent structure by replacng a sngle large LUT by number of 4-nput, smaller LUTs. However, ths arrangement leads to put a burden of an adder tree, as t s requred to add partal terms generated by each smaller LUT. Generally such LUT arrangement s referred as parttonng or slcng of LUT. rchtectural detals of slced based FIR flter s shown n fg.5. 73 P a g e

Input Regster Bank (IJCS) Internatonal Journal of dvanced Computer Scence and pplcatons, 4 LUT3 x(n) L L (C) (C) (Cm) (Cas) Fg. 5. Slced LUT based FIR flter ata flow graph of slced LUT based FIR flter, shown n fg.6, conssts of concurrently operatng 4-nput LUT nodes L s, adder nodes Ta, accumulator and shfter node S. In ths archtecture, requrement of adders n adder tree s governed by number of slces. ssumng the order of flter s dvsble by 4, for N th order FIR flter, N/4 wll be number of slces and (N/4)-1 wll be the number of adders. Thus LUT node L s, [log 2 (N/4)] adders and accumulator are the members of crtcal path. So the CPCT of the structure wll be: Where C SL C a T a C as x(n) 4 4 4 LUT 2 LUT1 LUT0 Slced LUTs dder Tree ccumulator Shfter CPCT (Slce) = (C SL + C a + C as ) (10) = access tme of one slce of LUT. = access tme of adder tree = [log 2 (N/4)]T a = access tme of an adder. = access tme of accumulator/shfter L s L s L s (CSL) Ls nodes = number of slces T a (Ca ) Fg. 6. ata flow graph of slced LUT based FIR flter T a Ta nodes = Log Ls ccess tme of LUT get reduced from C L to C SL due to slcng technque, however t has added the over heads of adder tree access tme C a n CPCT (slce). C. Indexed LUT based FIR flter LUTless and SlcedLUT has restrcted the exponental growth [22,23], however t has ncreased the burden of access tme of adder tree. So an attempt s made, to elmnate the use of adder tree by desgnng an ndexed LUT based FIR flter technque. In proposed desgn of Indexed LUT (ILUT) structure, node L of fg.2 s replaced by smaller, desrably ndexed LUTs L and multplexer. S (Cas) Fg. 7. ata flow graph of ndexed LUT based FIR flter FG of the proposed desgn derved from (6), s shown n fg.7. CPCT of ths structure, contrbuted by L -- nodes, wll now be: CPCT (Index) =C +C m +C as (11) Where C = access tme of an ndexed LUT. C m = access tme of multplexer C as = access tme of accumulator/shfter ccess tme C and C m are nterdependent. The trade off of an exponentally varyng LUT wth lnearly varyng multplexer sze helps to choose optmum CPCT of a structure. Hence, mproves overall operatng frequency of flter. It also elmnates the need of adder tree, whch further helps to mprove the operatng frequency. IV. L RELIZTION OF PROPOSE RCHITECTURE Proposed structure of ndexed LUT based FIR flter s elaborated n followng sectons. It s bult up wth four major components bank of nput regsters, look-up-table unt, accumulator/shfter unt and control unt.. Input regster bank Regster Bank, shown n fg.8, bult up wth N seral-n parallel-out shft regsters, accepts X(n) nput samples, n=0,1,..,n-1. In every clock pulse, regster contents take a rght shft and generates B terms of length N. X B-1 (0) X B-1 (1) (C) X B (N-1) X 1 (N-1) X 0 (N-1) Fg. 8. Input regster bank and address bfurcaton LUT address generated by regster bank s splt nto two address groups n and m. LSB n bts defne address of LUT, whereas number of LUT pages s defned by m bts. B. Proposed LUT unt Indexed LUT based FIR flter, comprses of ndexed LUT pages, each of sze 2 n and m bt multplexer unt as a page selecton module. It selects the desred value from desred page. Structural detals of an example, consdered n secton 2, of 6 th order FIR flter, wth n=4 and m=2, s shown n fg.9. Four LUT pages, each wth 16 locatons are connected n parallel, by set of 4 address lnes. multplexer unt of sze 4:1 selects an approprate output for further stage. S X 1 (0) X 0 (0) X 1 (1) X 0 (1) n address bts m address bts 74 P a g e

n address Bts LUT Contents Indexed wth I= 0 LUT ddress Bts Indexed LUT 0 1 LUT 2 Contents 3 Contents Indexed wth I= 4 0 0 LUT 0 ddress 0 Bts I+0 Indexed LUT LUT Contents Indexed wth I= 5 0 0 0 0 1 1 2 3 Contents LUT ddress I+0 Bts Indexed LUT 0 00 1 0 0 0 LUT 0 I+1 Contents I+0 Indexed wth I= 4+5 0 1 2 3 Contents 0 00 1 0 1 0 LUT 1 I+1+0 ddress I+0 Bts Indexed LUT 0 0 0 0 I+0 0 10 0 0 0 1 0 0 I+2 1 I+1 2 3 Contents 0 0 0 1 I+0 0 10 0 0 1 1 0 1 I+2+0 0 I+1+0 0 0 I+0 0 1 0 I+1 0 10 1 1 0 0 0 0 I+2+1 0 I+2 0 1 I+0 0 1 1 I+1+0 0 10 1 1 1 0 0 1 I+2+1+0 0 I+2+0 1 0 I+1 0 1 0 0 I+2 1 00 0 1 0 1 0 0 I+3 0 I+2+1 1 1 I+1+0 0 1 0 1 I+2+0 1 00 0 1 1 1 0 1 I+3+0 1 I+2+1+0 0 0 I+2 0 1 1 0 I+2+1 1 01 1 0 0 0 0 0 I+3+1 1 I+3 0 1 I+2+0 0 1 1 1 I+2+1+0 1 01 1 0 1 0 0 1 I+3+1+0 1 I+3+0 1 0 I+2+1 1 0 0 I+3 1 11 0 0 0 1 0 0 I+3+2 1 I+3+1 1 1 I+2+1+0 1 0 1 I+3+0 1 11 0 0 1 1 1 1 I+3+2+0 0 I+3+1+0 0 0 I+3 1 0 1 0 I+3+1 1 11 1 1 0 0 1 0 I+3+2+1 0 I+3+2 0 1 I+3+0 1 0 1 1 I+3+1+0 1 11 1 1 1 0 1 1 I+3+2+1+0 0 I+3+2+0 1 0 I+3+1 1 1 0 0 I+3+2 1 1 1 1 0 0 I+3+2+1 1 1 I+3+1+0 1 1 0 1 I+3+2+0 1 1 1 1 1 1 I+3+2+1+0 0 0 I+3+2 1 1 1 0 I+3+2+1 1 1 0 1 I+3+2+0 1 1 1 1 I+3+2+1+0 1 1 1 0 I+3+2+1 1 1 1 1 I+3+2+1+0 Fg. 9. Proposed structure of LUT unt C. ccumulator and Shfter Unt ccumulator and shfter are two separate combnatonal unts, however jontly these are responsble for calculatng the dot product term of flter output. Its hardware complexty s greatly nfluenced by the way of LUT addressed and accordngly a shft s gven to accumulator/shfter unt to generate partal products.. Control Unt It s a fnte state machne, shown n fg.10, defnes sequence of operaton and has overall control on flterng operaton. SHIFT ILE INIT LUT FETCH RESET=1 E=1 CLK=1 CNT 8 CNT= 8 OUT Fg. 10. Control unt of proposed structure Flterng operaton remans n dle state wth applcaton of reset. It starts wth enable sgnal E and takes teraton equal to nput precson for every clock cycle. t the end of count t gves flter output and operaton begns wth next fetch cycle. V. PERFORNCE NLYSIS Performance s evaluated based on operatng frequency. esgn s mplemented on FPG Vertex IV, for partcular flter order N and for all possble combnatons of n and m, as shown n table IV. Each node of proposed structure s crtcally analyzed for CPCT of proposed structure, for the range of flter from 4 to 8. Table IV gves the detals of flter operatng frequency wth varaton n access tmes of LUT page C and multplexer unt C m. U X m address Bts Partal output Graphcal representaton for 8 th order FIR fler s shown n fg. 11. It ndcates that, access tme of LUT page C ncreases exponentally wth n, at the same tme access tme of multplexer C m decreases lnearly. If f max s assumed to be the maxmum operatng frequency, T sample s the mnmum tme requred to process each output sample, then s T sample CPCT C + C m + C as (12) f max = 1/ T sample f max 1/ C + C m + C as (13) s CPCT mnma of flter s obtaned at the pont of ntersecton of LUT access tme C and UX access tme C m, whch leads to maxmum operatng frequency. Thus flter desgn corresponds to these values of m and n wll be treated as optmzed desgn. TBLE IV. Order of Flter CCESS TIE NLYSIS OF LUT UNIT OULES ddress Lne dstrbuton LUT Unt ccess tme analyss Operatng freq. n Hz n m C C m 8 7 1 6.58 3.6 151.389 6 2 5.45 4.06 160.937 5 3 5.02 4.46 155.876 4 4 4.65 4.8 184.834 3 5 4.65 5.16 169.544 2 6 4.6 5.5 168.714 1 7 3.84 6.1 176.625 7 6 1 5.45 3.6 189.92 5 2 5.02 4.06 180.874 4 3 4.65 4.46 183.441 3 4 4.65 4.8 191.18 2 5 4.6 5.16 182.45 1 6 3.84 5.5 190.13 6 5 1 5.02 3.6 190.3 4 2 4.65 4.06 192.417 3 3 4.65 4.46 205.495 2 4 4.6 4.8 190.389 1 5 3.84 5.16 192 5 4 1 4.65 3.6 206.793 3 2 4.65 4.06 228.645 2 3 4.6 4.46 239.664 1 4 3.84 4.8 215.736 4 3 1 4.65 3.6 242.93 2 2 4.6 4.06 242.93 1 3 3.84 4.46 244.09 Fg. 11. Relaton between access tme analyss of LUT unt modules and operatng frequency of 8 th order FIR flter 75 P a g e

TBLE VI. STRUCTURL COPLEXITY OF PREVIOUS N PROPOSE ESIGNS Fg. 12. Relaton of maxmum operatng frequency wth order of flter Ths technque can further be extended to any desred order of flter. Flter performance upto 256 order s shown n fg. 12. Results obtaned by the proposed technque are compared wth Conventonal, LUTless [22] and Slced LUT [23] TBLE V. OPERTING FREQUENCY COPRISON OF VRIOUS RCHITECTURES Order Operatng frequency of based flter n Hz of flter Conven tonal LUTless Slced Proposed desgn 4 242.4 242.93 240.13 244.09 5 239.01 239.06 220.037 239.664 6 200.95 174.074 200.122 205.495 7 184.65 175.503 185.685 191.18 8 176.22 174.28 167.726 184.834 technques, whch were mplemented on ltera Stratx FPG chp. To surmount the platform dfferences, these technques are fathfully mplemented on same platform as that of the proposed technque. esred flter coeffcents are obtaned from FTool, a specal toolbox of TLB, whch are truncated and scaled to 8-bt precson. Xlnx Integrated Software Envronment (ISE) s used for performng synthess and mplementaton of the desgns. To valdate the correct functonalty usng random nput, each mplementaton s smulated wth the smulaton tool provded by Xlnx. comparatve study of maxmum operatng speed of conventonal,lutless, Slced and proposed based flter technques s presented n table V and ts graphcal representaton s n fg.13. Fg. 13. Comparson of operatng frequency Structural Complextes Order of flter Convento LUTless Proposed Slced nal desgn Input Regster NxB NxB NxB NxB emory Bts C= 2 N x B - S= (a x 2 l ) x B I= (2 m x 2 n ) xb ecoder N: 2 N - a(l:2 l ) 2 m (n:2 n ) Number of dders - N-1 a-1 - epth of dders - B+log 2N B+log 2a - ultplexers - - - 2 m :1 CPCT C L + C as C + C a+ C SL +C a+ C +C m+c as C as Latency B+1 B+1 B+1 B+1 Throughput B+2 B+2 B+2 B+2 Operatng frequency reduces wth the order of flter s one of the obvous observatons ndcated n table V. It s also observed that operatng frequency of proposed technque s hgher than conventonal and exstng [22,23] technques. No much gan n rse of frequency s obtaned at 4 th order as technques are get correlated wth technology platform, however frequency growth s ncreasng along wth the order of flter. Structural complextes of N th order flter are analyzed and performances are compared for random nput samples x(n). Word length of nput sample and flter coeffcent s assumed to be of B bts, whch makes sze of nput regster bank to be same for all desgns under consderaton. Latency and throughput found same n all based structures; however operatng speed of ndvdual technque makes the value to dffer. For mplementaton of N th order conventonal based FIR flter requres memory array of 2 N x B bts and the sze of decoder s N:2 N. CPCT of the structure s (C L + C as ), ncreases exponentally due to exponental rse n C L, however C as s ndependent wth order of flter. Thus t s almost constant n all structures. Structural complextes of conventonal based FIR flters are consdered as bench marks for performance comparson. Slcng of sngle large memory reduces the memory requrement of desgn from 2 N X B of conventonal to (a X 2 l ) X B; where a and l are the factors of N. Thus decoder also get changed from sngle N: 2 N to a, l:2 l. s multple terms are generated by ths technque, need at least a-1 adders to generate coeffcent sum as partal term. sngle large LUT s replaced by smaller LUTs, reduces LUT access tme from C L to C SL, however t adds adder access tme C a, tendng to ncrease CPCT of structure. LUTless technque selects flter coeffcent on-lne by multplexer, elmnates the need of memory and correspondng decoder at the cost of N-1 adders. s LUT s replaced by multplexers and adders, C and C a are the contrbutors of CPCT, whch are hghly flter order dependent. In proposed technque, ndexng of LUT pages reduces ts access tme C nstead of C L as well as elmnates C a as a prme contrbutor of CPCT of LUTless and slced LUT based technques. It adds a small burden of LUT page selecton C as 76 P a g e

module Cm, to CPCT of structure. However t leads to reduce overall CPCT, leadng to ncrease n operatng frequency. Ths rse n frequency s sgnfcant wth hgher flter order as ndcated n table V. VI. CONCLUSION For hgh speed FIR flter mplementaton n dstrbuted arthmetc, the exponental rse of memory access tme wth the flter coeffcents has always been consdered to be a fundamental drawback. LUTless and slced LUT based technque restrcts exponental growth, however needs adders to generate partal term. Number of adders and depth of adders, s governed by order of flter n LUTless technque. However n slced LUT based technque, number of slces defnes number of adders. Even for partcular flter order, number of adders ncreases wth ncrease n number of slces, tendng to ncrease CPCT of structure. n nnovatve technque to reduce CPCT of FIR flter s desgned and mplemented successfully, whch leads to ncrease n operatng frequency. Indexng of LUT restrcts exponental growth and also completely elmnates need of adders whch results n sgnfcant reducton n CPCT and maxmzes operatng frequency. REFERENCES [1] tra S. K., gtal flter structures: gtal Sgnal Processng- Computer Based pproach, 3 rd ed., Inda.Tata cgraw Hll, 2008, pp.427-437 [2] Henry Samuel, n Improved Search lgorthm for the esgn of ultplerless FIR Flters wth Powers-of-Two Coeffcents, IEEE Transactons on Crcuts and Systems, Vol 3.6, No.7, pp.1044-1047, July1989. [3] Joseph B. Evans, n Effcent FIR Flter rchtectures Sutable for FPG Implementaton, Proceedngs of the IEEE Internatonal Symposum on Crcuts and Systems(ISCS), pp.226-228, 1993. [4] Woo Jn Oh, Yong Hoon Lee, Implementaton of Programmable ultplerless FIR Flters wth Powers-Of-Two Coeffcents, IEEE Transactons on Crcuts and Systems-II nalog and gtal Sgnal Processng, Vol. 42, No. 8, pp.553-556, ugust 1995. [5] Ke-Yong Khoo, lan Kwentus, and lan N. Wllson, Jr., Programmable FIR gtal Flter Usng CS Coeffcents, IEEE Journal of Sold-State Crcuts Vol II No 6, pp.869-874, June 1996. [6] awoud. S., Realzaton of ppelned multpler - free FIR dgtal flter, Proc. IEEE frcon Conference,pp.335 338, 1999. [7] arko Kosunen, Kar Halonen, Programmable Fr Flter Usng Seral-In-Tme ultplcaton nd Canonc Sgned gt Coeffcents Proceedngs of the 7 th IEEE Internatonal Conference on Electroncs, Crcuts and Systems(ICECS), pp.563-566, 2000. [8] Kah-Howe Tan, Wen Fung Leong, Kadambar Kalur,. Soderstrand and Lous G. Johnson, FIR Flter esgn Program that atches Specfcatons Rather than Flter Coeffcents Results n Large Savngs n FPG Resources Proceedngs of the IEEE Internatonal Conference Record of the Thrty-Ffth slomar Conference on Sgnals, Systems and Computers, Vol.2, pp.1349-1352, 2001. [9] Zhangwen Tang, Je Zhang and Hao n, Hgh-Speed, Programmable, CS Coeffcent FIR Flter, IEEE Transactons on Consumer Electroncs, Vol. 48, No. 4, pp. 834-837, November 2002. [10] K. S. Yeung and S. C. Chan, ultpler-less gtal flters Usng Programmable Sum-of-Power-of-Two(SO POT) Coeffcents, Proceedngs of the IEEE Internatonal Conference on Feld- Programmable Technology(FPT), pp.78-84, 2002. [11] Pramod Kumar eher, New Look-up-Table Optmzatons for emory-based ultplcaton, Proceedngs of the 12 th IEEE Internatonal Symposum on Integrated Crcuts, ISIC 2009, pp.663-666, 2009. [12] Pramod Kumar eher, New pproach to Look-Up-Table esgn and emory-based Realzaton of FIR gtal Flter, IEEE Transactons on Crcuts and Systems I: Regular Papers, Vol. 57, No. 3, pp. 512-603, arch 2010. [13] Keshab K. Parh, VLSI gtal Sgnal Processng- esgn and Implementaton, n John Wley & Sons, Inda,1999, pp.36-37,43-45. [14]. Croser. J. Esteban,. E. Levlon, and V. Rzo, gtal flter for PC encoded sgnals, U.S. Patent3 777 130, ec. 4,1973 [15]. Peled and B. Lu, new hardware realzaton of dgtal flters, n IEEE Trans. coust. Speech, Sgnal Process., vol. 22, no. 6, pp. 456 462, ec.1974. [16] S..Whte, pplcatons of the strbuted rthmetc to gtal Sgnal Processng: Tutoral Revew, IEEE SSP ag., vol. 6, no. 3, pp. 5 19, Jul. 1989. [17] Wayne P. Burleson, Lous L. Scharf, VLSI esgn of Inner-Product Computers Usng strbuted rthmetc, Proceedngs of the IEEE Internatonal Symposum on Crcuts and Systems(ISCS), pp.158-161, 1989. [18] Rud BabE, tja Solar, Bruno Stglc, Hgh Order FIR gtal Flter Realzaton n strbuted rthmetc Proceedngs of 6th edterranean Electrotechncal Conference, pp.367-370,1991. [19] Jung-Pal Cho Seung- Cheol Shn Jn- Gyun Chung, Effcent Rom Sze Reducton For.strbuted rthmetc, IEEE Internatonal Symposum on Crcuts and Systems(ISCS), pp. II-61 to II-64, 2000. [20] T. S.Chang and C.-W.Jen, Hardware-effcent ppelned programmable FIR flter desgn, Proceedng of IEEE on Computers and gtal Technques, vol.148, ssue: 6, pp.227-232,2001. [21] Chn-Chao Chen, Tay-Jy Ln, Chh-We Lu, and Chen-We Jen, Complexty-ware esgn of -Based FIR Flters, proceedng of IEEE sa-pacfc Conference on Crcuts and Systems, pp.445-448,2004. [22] Heejong Yoo nd avd V. nderson, Hardware-Effcent strbuted rthmetc rchtecture For Hgh-Order gtal Flters Proceedng of IEEE Internatonal Conference on coustcs, Speech and Sgnal Processng(ICSSP),Vol.5, pp.v-125-v-128, 2005. [23] Patrck Longa nd l r, rea-effcent FIR Flter esgn On FPG Usng strbuted rthmetc, IEEE Internatonal Symposum on Sgnal Processng nd Informaton Technology, pp. 248-252, 2006. [24]. ehendale, S.. Sherlekar, and G.Venkatesh, rea-delay Tradeoff n dstrbuted arthmetc based mplementaton of FIR flters, Proceedngs of 10 th Internatonal Conference of VLSI esgn, pp. 124-129, 1997. [25] Shann-Shun Jeng, Hsng-Chen Ln, nd &Shu-ng Chang, FPG Implementaton of FIR Flter Usng -Bt Parallel strbuted rthmetc IEEE, Internatonal Symposum on Crcuts and Systems(ISCS),pp.875-878,2006. [26] Pramod Kumar eher, Shrutsagar Chandrasekaran, bbes mra, FPG Realzaton of FIR Flters by Effcent and Flexble Systolzaton Usng strbuted rthmetc, IEEE Transactons on Sgnal Processng, Vol. 56, No. 7, pp.3009-3017, 2008. [27] B.K.ohanty,P.K eher, Hgh-Performance FIR Flter rchtecture for Fxed and Reconfgurable pplcatons, IEEE Transactons on Very Large Scale Integraton (VLSI) Systems, ssue 99,1-9, 2015. UTHOR PROFILES Sunta ukund Badave receved the B.E. degree n Electrcal (Electroncs Specalzaton) from Shvaj Unversty n 1989 and.e.egree n Electrcal from r.b...unversty., urangabad, Inda, n 1998. She s currently workng toward the Ph.. degree n Electroncs at r.b...unversty. Her research nterests nclude archtectures and crcut desgn for dgtal sgnal processng. She has presented nearly 16 techncal papers n at Natonally and Internatonally. rs. S.. Badave s ember of the Insttute of Electroncs and Telecommuncaton Engneers(IETE),Inda and lfe member of Indan Socety for Techncal Educaton(ISTE)Inda.She s also member of IENG, Internatonal ssocaton of Engneers. njal S. Bhalchandra receved the B.E. Electroncs and Telecommuncaton degree and.e. Electroncs degree n 1985 and 1992 respectvely. She has completed her Ph.. 77 P a g e

n Electroncs from S.R..Unversty, Nanded, Inda, n 2004. She has a scentfc and techncal background coverng the areas of Electroncs and Communcaton. Currently, she s Head of Electroncs and Telecommuncaton Engneerng epartment and ssocate Professor n Government College of Engneerng, urangabad. Her research nterest ncludes mage processng, sgnal processng and communcaton. She has publshed more than 50 techncal papers n varous reputed journals and conference proceedngs. r. Bhalchandra s a Fellow of the Insttuton of Engneers (IE), Inda and lfe member of Indan Socety for Techncal Educaton(ISTE)Inda. 78 P a g e