Designing an Efficient and Secured LUT Approach for Area Based Occupations

Similar documents
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

ALONG with the progressive device scaling, semiconductor

OMS Based LUT Optimization

LUT Optimization for Memory Based Computation using Modified OMS Technique

Design of Memory Based Implementation Using LUT Multiplier

A Novel Architecture of LUT Design Optimization for DSP Applications

Optimization of memory based multiplication for LUT

Design and Implementation of LUT Optimization DSP Techniques

K. Phanindra M.Tech (ES) KITS, Khammam, India

Implementation of Memory Based Multiplication Using Micro wind Software

Modified Reconfigurable Fir Filter Design Using Look up Table

Memory efficient Distributed architecture LUT Design using Unified Architecture

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

N.S.N College of Engineering and Technology, Karur

Designing Fir Filter Using Modified Look up Table Multiplier

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

An Lut Adaptive Filter Using DA

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

An Efficient Reduction of Area in Multistandard Transform Core

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Memory Based Computing for DSP. Pramod Meher Institute for Infocomm Research

The input-output relationship of an N-tap FIR filter in timedomain

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

VLSI IEEE Projects Titles LeMeniz Infotech

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation

FPGA Hardware Resource Specific Optimal Design for FIR Filters

FPGA Implementation of DA Algritm for Fir Filter

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

[Dharani*, 4.(8): August, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Implementation of Low Power and Area Efficient Carry Select Adder

Distributed Arithmetic Unit Design for Fir Filter

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Efficient Implementation of Multi Stage SQRT Carry Select Adder

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

Reconfigurable Fir Digital Filter Realization on FPGA

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Field Programmable Gate Arrays (FPGAs)

An MFA Binary Counter for Low Power Application

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

A Fast Constant Coefficient Multiplier for the XC6200

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

Research Article Low Power 256-bit Modified Carry Select Adder

L12: Reconfigurable Logic Architectures

ISSN:

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array

Implementation of High Speed Adder using DLATCH

L11/12: Reconfigurable Logic Architectures

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

Microprocessor Design

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

Design on CIC interpolator in Model Simulator

Bus Encoded LUT Multiplier for Portable Biomedical Therapeutic Devices

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

A Parallel Area Delay Efficient Interpolation Filter Architecture

High Performance Carry Chains for FPGAs

FPGA Implementation of Viterbi Decoder

University of Maiduguri Faculty of Engineering Seminar Series Volume 6, december 2015

Design of BIST with Low Power Test Pattern Generator

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

THE USE OF forward error correction (FEC) in optical networks

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

ENGG2410: Digital Design Lab 5: Modular Designs and Hierarchy Using VHDL

The Design of Efficient Viterbi Decoder and Realization by FPGA

VLSI System Testing. BIST Motivation

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Architecture of Discrete Wavelet Transform Processor for Image Compression

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

Hardware Implementation of Viterbi Decoder for Wireless Applications

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online:

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

A Low Power Delay Buffer Using Gated Driver Tree

Implementation of 2-D Discrete Wavelet Transform using MATLAB and Xilinx System Generator

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

White Paper Versatile Digital QAM Modulator

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Design and Simulation of Modified Alum Based On Glut

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

An Efficient High Speed Wallace Tree Multiplier

Design of an Area-Efficient Interpolated FIR Filter Based on LUT Partitioning

Improved 32 bit carry select adder for low area and low power

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

COE328 Course Outline. Fall 2007

Fpga Implementation of Low Complexity Test Circuits Using Shift Registers

DESIGN OF HIGH PERFORMANCE, AREA EFFICIENT FIR FILTER USING CARRY SELECT ADDER

Transcription:

Designing an Efficient and Secured LUT Approach for Area Based Occupations 1 D. Jahnavi, 2 Y. Ravikiran varma 1 M.Tech scholar, E.C.E, Sreenivasa institute of technology and management studies, Chittoor 2 Assistant Professor, E.C.E, Sreenivasa institute of technology and management studies, Chittoor ABSTRACT: In this project, the implementation of multiplier with more enhanced LUT technique is presented. Anti-symmetric product coding (APC) and odd-multiple-storage (OMS) techniques for lookup-table (LUT) design for memory-based multipliers to be used in digital signal processing applications. These two techniques separately reduces LUT size to half. In this project, it presents a different form of APC and a modified OMS scheme, in order to combine them for an efficient memory-based multiplication. The proposed mixed approach implements a reduction in LUT size to one-fourth of the conventional LUT. It has also suggested a simple technique for selective sign reversal to be used in the proposed design. It is shown that the proposed LUT design for small input sizes can be used for efficient implementation of highprecision multiplication by input operand decomposition. Keywords: Memory based computations, antisymmetric product coding, odd-multiple-storage, lookup-table, Digital signal processing INTRODUCTION: In terms of the algorithms employed, the pampers are divided into structural and functional. Structural pampers consider the circuit graph as a given and find a covering of the graph with K-input sub graphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into sub-functions of limited support size realizable by individual LUTs. Since functional pampers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DA Omap[2] as representative of the advanced structural technology mapping for LUTbased FPGAs and refer to it as the previous work and discuss several ways of improving it. Field Programmable Gate Arrays (FPGAs) are an attractive hardware design option, making technology mapping for FPGAs an important EDA problem. For an excellent overview of the classical and recent work on FPGA technology mapping, focusing on area, delay, and power minimization, the reader is referred to [2]. The recent advanced algorithms for FPGA mapping, such as [2][12][16][23], focus on area minimization under delay constraints. If delay constraints are not given, first the optimum delay for the given logic structure is found and then area is minimized without changing delay. In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input sub graphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into sub-functions of limited support size realizable by individual LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DA Omap [2] as representative of the advanced structural technology mapping for LUTbased FPGAs and refer to it as the previous work and discuss several ways of improving it. LUT for Multipliers: Multiplications can be computationally expensive in most hardware and software implementations. Various approaches in literature have been proposed to alleviate this overhead, usually at the cost of multiplication accuracy. One such example is the conversion of multiplication coefficients to dyadic fractions, which can be computed with a minimal sequence of bit shifts and additions. However, such approaches have proved to be limiting, requiring a lot of handtweaking to simultaneously minimize the complexity of the calculation as well as the deviation from the desired result. Instead, a tablebased lookup scheme to implement the multiplication steps is proposed. Whenever a multiplication result is needed, the system can simply look up the correct result on a precomputed table, without needing any computation whatsoever. This greatly simplifies the transform and inverse calculations. It is possible to store binary data within solid-state devices. Those storage "cells" within solid-state memory devices are easily addressed by driving the "address" lines of the device with the proper binary values. A ISSN: 2231-5381 http://www.ijettjournal.org Page 372

ROM memory circuit written, or programmed, with certain data, such that the address lines of the ROM served as inputs and the data lines of the ROM served as outputs, generating the characteristic response of a particular logic function Table lookup can replace any coefficient multiplication or unary operation. Although table lookup is often simpler than the actual calculation, the table size grows exponentially with the input signal range. However, for image and video applications, most signals are unsigned 8 bit values, which require only 256 possible cases, so the table based approach can be implemented with a reasonable cost. It used to implement coefficient multiplication, where the coefficient is 0.6834. To avoid using a multiplier, traditional lossless transforms approximate the given coefficient with a dyadic fraction (for example, to ¾). Then the coefficient multiplication can be implemented using shifts and additions as shown. Table lookup is also depicted.unlike in the dyadic fraction case, table based multiplication yields a much more accurate approximation of the original coefficient. Literature Survey: a.the efficient memorybased VLSI array designs for DFT and DCT Guo, J.-I.; Liu, C.-M.; Jen, C.-W Nat. Chiao Tung Univ., Hsinchu :Efficient memory-based VLSI arrays and a new design approach for the discrete Fourier transform and discrete cosine transform are presented. The DFT and DCT are formulated as cyclic convolution forms and mapped into linear arrays which characterize small numbers of I/O channels and low I/O bandwidth. b.on the design automation of the memorybased VLSI architectures for FIR filters perfect cyclic forms to facilitate an efficient realization of 1-D N-point DCT using (N-1)/2 adders or sub tractors, one small ROM module, a barrel shifter, and N-1/2+1 accumulators. PROPOSED TECHNIQUE: Lee, H.-R. Jen, C.-W. Liu, C.-M. Dept. of Electron. Eng., Nat. Chiao Tung Univ., Hsinchu:An approach to automating the design of memory based VLSI architectures for FIR filters has been developed. The automation is based on the exploration of the design space and schemes for efficient memory replacement, algorithm formulation, architecture design, and evaluation method. c.a memory-efficient realization of cyclic convolution and its application to discrete cosine transform The memory efficient design for realizing the cyclic convolution and its application to the discrete cosine transform. To adopt the method of distributed arithmetic computation, and exploit the symmetry property of DCT coefficients to merge the elements in the matrix of the DCT kernel and then separate the kernel to be two LUT optimization is the main key factor in our project, in order to reduce power and area. The following techniques have to be implemented in LUT to get required qualities. 1. Anti symmetric Product coding (A.P.C) 2. Modified Odd multiple storage (O.M.S) In this project, for the reduction of look-up-table (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel anti-symmetric product coding (APC) scheme is proposed to reduce the LUT size by further half, where the LUT output is added with or subtracted from a fixed value. It is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUTmultipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed optimized LUTmultiplier is found to involve less area and less multiplication time than the existing LUTmultipliers. ISSN: 2231-5381 http://www.ijettjournal.org Page 373

International Journal of Engineering Trends and Technology (IJETT) Volume 5 Number 7- Nov 2013 The proposed APC OMS combined design of the LUT for L = 5 and for any coefficient width W is shown in Fig. 3. It consists of an LUT of nine words of (W + 4)-bit width, a four-to-nineline address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter. The precomputed values of A (2i + 1) are stored as Pi, for i = 0, 1, 2,..., 7, at the eight consecutive locations of the memory array, as specified in Table II, while 2A is stored for input X = (00000) at LUT address 1000, as specified in Table III. The decoder takes the 4-bit address from the address generator and generates nine word-select signals, i.e., {wi, for 0 i 8}, to select the referenced word from the LUT. The 4-to-9-line decoder is a simple modification of 3-to-8-line decoder, as shown in Fig. 4(a). The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations Step2: Calculate APC word of X Step3: If X(4)=1 then output <= 16A - APC word(x) Else Output <= 16A + APC word(x) OMS: Step1:Takes last four bits of X Step2: Calculate s0, s1 and address Step3: Depends on s0, s1 output is shifted and stored into final output Proposed System Architecture A new approach to LUT design is presented, where only the odd multiples of the fixed coefficient are required to be stored, which is referred to as the odd-multiple-storage scheme in this brief. In addition, we have shown that, by the anti-symmetric product coding approach, the LUT size can also be reduced to half, where the product words are recoded as Anti-symmetric pairs. Fig: Architecure of Present method If the input bit size= 5 then the memory stored is of 2^5/2 = 15 locations which results in a reduction in LUT size by factor of 2. Hardware Environment: FPGA Implementation FPGA stands for field programmable gate arrays that can be configured ISSN: 2231-5381 http://www.ijettjournal.org Page 374

by the customer or designer after manufacturing. Field programmable gate arrays are called this because rather than having a structure similar to a PAL or other programmable device, they are structured very much like a gate array ASIC. This makes FPGAs very nice for use in prototyping ASICs, or in places where and ASIC will eventually be used. For example an FPGA may be used in a design that needs to get to market quickly regardless of cost. Later an ASIC can be used in place of the FPGA when the production volume increases, in order to reduce cost. FPGAs are programmed using a logic circuit diagram or a source code in a HDL to specify how the chip will work. FPGAs contain programmable logic components called "logic blocks" and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together". The programmable logic blocks are called configurable logic blocks and reconfigurable interconnects are called switch boxes. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flipflops or more complete blocks of memory. Fig: Flow chart of proposed technique ISSN: 2231-5381 http://www.ijettjournal.org Page 375

SIMULATION RESULT OF LUT OPTIMIZATION: APPLICATIONS: The applications of LUT optimization for memory based computation are: 1. Bio-medical: The total body wireless operations systems have nano components like nano cameras, CROs. Nano caeras have to be designed with less area occupancy inorderto embed in to human body. So, in design of those nano devices LUTs plays a vital role. CONCLUSION : Finally, an advanced and efficient LUT based multiplier is designed with reduction in area and barrel shifters. This yields multiple through put and gives huge applications with more comfort. Implementation of this type of LUT plays vital role in all type of applications such Biomedical, tele communications, militaries. REFERENCES: [l] A. V. Oppenheim and R. W. Schaffer, Discrete Time Signal Processing, Prentice Hall, 1989 [2] S. A. White, "Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review", IEEE ASSP Magazine, July 1989, pp. 4-19 [3] M. Mehendale, S. D. Sherlekar and G. Venkatesh, "Area-Delay Tradeoff in Distributed Arithmetic based Implementation of FIR Filters", VLSI Design 97, pp. 134-129 [4] S. Wolter, A. Schubert, H. Matz, R. Laur, "On the Comparison between Achitectures for the Implementationof Distributed Arithmetic", ISCAS 93, pp. 1829-1832 [5] K. Nourji and N. Demassieux, "Optimization of Real- Time VLSI Architectures for Distributed Arithmetic based Algorithms : Application to HDTV Filters",ISCAS 94, vol. 4, pp. 223-226 [6] E. M. Sentovich et. al. "SIS: A System for Sequential Circuit Synthesis", Memorandum No. UCB/ERL M92/41 [8] V. S. Rosa, E. Costa, S. Bampi. A High Performance Parallel FIR Filters Generation Tool. In Iberchip, San Jose:Costa Rica, 2006. [9] Altera Corporation, 101 Innovation Drive, San Jose,California 95134, USA. http://www.altera.com [10] Xilinx, Inc. http://www.xilinx.com [11] Hamming, R. W. Digital Filters, Prentice Hall, 3rd ed., 1989. [12] A. K. Sharma, AdvancedSemiconductor MemoriesArchitectures, ignsandapplications. scataway,nj:ieeepress,2003. [13]K.Meher, NewapproachtoLUTimplementat ionandaccumulationformemory- based multiplication, inproc. IEEEISCAS,May2009 [14] P.K.Meher, Memorybasedhardwareforresource-constrained digital signalprocessingsystems, inproc.6thint.conf.ici CS,Dec.2007, pp.1 4. [15] International TechnologyRoadmap for Semiconductors. [Online]. [16]P.K.Meher, Newlook-up-Table optimizationsformemoryb based multiplication, inproc.isic,dec.2009,pp. 663 666. [17] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, A systolic array architecture for the discrete sine transform, IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2347 2354, Sep. 2002. [18] H.-C. Chen, J.-I. Guo, T.-S. Chang and C.- W. Jen, A memory-efficient realization of cyclic convolution and its application to discrete cosine transform, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445 453, Mar. 2005 [19] P. K. Meher, Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041 1050, Sep. 2006. [20] P. K. Meher, New approach to LUT implementation and accumulation for memory-based multiplication, in Proc. IEEE ISCAS, May 2009, pp. 453 456. ISSN: 2231-5381 http://www.ijettjournal.org Page 376