International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

Similar documents
ALONG with the progressive device scaling, semiconductor

Design of Memory Based Implementation Using LUT Multiplier

OMS Based LUT Optimization

A Novel Architecture of LUT Design Optimization for DSP Applications

LUT Optimization for Memory Based Computation using Modified OMS Technique

Optimization of memory based multiplication for LUT

Implementation of Memory Based Multiplication Using Micro wind Software

Design and Implementation of LUT Optimization DSP Techniques

Modified Reconfigurable Fir Filter Design Using Look up Table

Designing an Efficient and Secured LUT Approach for Area Based Occupations

K. Phanindra M.Tech (ES) KITS, Khammam, India

N.S.N College of Engineering and Technology, Karur

Implementation of Area Efficient Memory-Based FIR Digital Filter Using LUT-Multiplier

Memory efficient Distributed architecture LUT Design using Unified Architecture

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

Designing Fir Filter Using Modified Look up Table Multiplier

An Lut Adaptive Filter Using DA

Memory Based Computing for DSP. Pramod Meher Institute for Infocomm Research

The input-output relationship of an N-tap FIR filter in timedomain

An Efficient Reduction of Area in Multistandard Transform Core

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

Figure.1 Clock signal II. SYSTEM ANALYSIS

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation

FPGA Hardware Resource Specific Optimal Design for FIR Filters

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

VLSI IEEE Projects Titles LeMeniz Infotech

An MFA Binary Counter for Low Power Application

THE USE OF forward error correction (FEC) in optical networks

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Implementation of Low Power and Area Efficient Carry Select Adder

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Microprocessor Design

FPGA Implementation of DA Algritm for Fir Filter

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

A Fast Constant Coefficient Multiplier for the XC6200

An FPGA Implementation of Shift Register Using Pulsed Latches

Distributed Arithmetic Unit Design for Fir Filter

Design and Simulation of Modified Alum Based On Glut

Reconfigurable Fir Digital Filter Realization on FPGA

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

[Dharani*, 4.(8): August, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

A Symmetric Differential Clock Generator for Bit-Serial Hardware

FPGA Implementation of Viterbi Decoder

Research Article Low Power 256-bit Modified Carry Select Adder

Implementation of 2-D Discrete Wavelet Transform using MATLAB and Xilinx System Generator

Modeling Digital Systems with Verilog

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array

L12: Reconfigurable Logic Architectures

A Low Power Delay Buffer Using Gated Driver Tree

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

Power Optimization by Using Multi-Bit Flip-Flops

International Journal of Engineering Research-Online A Peer Reviewed International Journal

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

Design of BIST with Low Power Test Pattern Generator

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Transactions Briefs. Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Design and Analysis of Modified Fast Compressors for MAC Unit

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

The Design of Efficient Viterbi Decoder and Realization by FPGA

Implementation of High Speed Adder using DLATCH

Sharif University of Technology. SoC: Introduction

An Efficient High Speed Wallace Tree Multiplier

COE328 Course Outline. Fall 2007

Low Power Area Efficient Parallel Counter Architecture

Reduction of Area and Power of Shift Register Using Pulsed Latches

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

Hardware Implementation of Viterbi Decoder for Wireless Applications

Radar Signal Processing Final Report Spring Semester 2017

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

L11/12: Reconfigurable Logic Architectures

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

A VLSI Architecture for Variable Block Size Video Motion Estimation

Design of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2

Low-Power Decimation Filter for 2.5 GHz Operation in Standard-Cell Implementation

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P11 ISSN Online:

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

ISSN:

Transcription:

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna 2 Vignan Universit, Guntur district ABSTRACT: In this project, the anti-symmetric product coding (APC) and odd-multiple-storage (OMS) techniques for lookup-table (LUT) design for memorybased multipliers are presented to be used in digital signal processing applications. All these techniques results in the reduction of the LUT size by a factor of two. We present a different form of APC and a modified OMS scheme, in order to combine them for efficient memory-based multiplication. The proposed combined approach provides a reduction in LUT size to one-fourth of the conventional LUT. It has also suggested a simple technique for selective sign reversal to be used in the proposed design. It is shown that the proposed LUT design for small input sizes can be used for efficient implementation of high-precision multiplication by input operand decomposition. Keywords: anti-symmetric product coding, oddmultiple-storage, lookup-table, Digital signal processing INTRODUCTION: A look-up table (LUT) size of 4 is the most area efficient in a non clustered context. A LUT size of 5 to 6 gave the best performance. The work in [12] has suggested that using a heterogeneous mixture of LUT sizes of 2 and 3 was equivalent in area efficiency to a LUT size of 4 and, hence, could be a good choice. In addition, [1] states that a logic structure using two three-input LUTs was most beneficial in terms of area and speed. However, it must be noted that both these last two papers did not perform a full area or delay study where a range of LUT sizes was examined. First, prior work focused on non clustered logic blocks, which are known to have a significant impact on the area and delay [21]. Second, most prior studies tended to look at area or delay, but not both as we will here. Third, prior results were based on IC process generations that are several factors larger than current process generations, and so do not take deep-submicron electrical effects into account. In the present work, we perform detailed transistor-level design of circuits and perform appropriate buffer and transistor sizing for all the logic and routing elements. Field Programmable Gate Arrays (FPGAs) are an attractive hardware design option, making technology mapping for FPGAs an important EDA problem. For an excellent overview of the classical and recent work on FPGA technology mapping, focusing on area, delay, and power minimization, the reader is referred to [2]. The recent advanced algorithms for FPGA mapping, such as [2][12][16][23], focus on area minimization under delay constraints. If delay constraints are not given, first the optimum delay for the given logic structure is found and then area is minimized without changing delay. In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input subgraphs corresponding to LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DAOmap [2] as representative of the advanced structural technology mapping for LUT-based FPGAs and refer to it as the previous work and discuss several ways of improving it. LOOK UP TABLE: LUT means Look Up Table. It s helpful to think of it like a math problem: R= S+L R being your result or what you want to attain. S being your source or what you start with. L being your LUT or the difference needed to make up between your ISSN: 2231-5381 http://www.ijettjournal.org Page 3308

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 source and your desired outcome. In all cases of LUT use, the LUT is the means to make up the difference between source and result.((all cases assume the colorist (or you) is grading through a correctly calibrated monitor for evaluation and finishing. LUTs in no way replace proper calibration or color correction. In computer science, a lookup table is an array that replaces runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation or input/output operation. [1] The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase (memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. as the address for the LUT, then the corresponding product value A Xi is available as its output. Let input be X, and it should be multiplied with A. The products are as shown in second column of above table. In our design product values are stored in LUT S. Each product value is stored in separate row. For the selection of product value, input data is acts as a address. If the input size is of length 5 then 2 5 values are to be stored. If the input length increases more number of data is to be stored and it requires more memory. PROPOSED TECHNIQUE: Present technique: LUT optimization is the main key factor in our project, in order to reduce power and area. The following techniques have to be implemented in LUT to get exact optimized results. 1. Anti symmetric Product coding (A.P.C) 2. Modified Odd multiple storage (O.M.S) A conventional lookup-table (LUT)-based multiplier is shown in Fig. 1, where A is a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be 2L possible values of X, and accordingly, there can be 2L possible values of product C = A X. Therefore, for memory-based multiplication, an LUT of 2L words, consisting of precomputed. Fig 1: Conventional LUTbased multiplier product values corresponding to all possible values of X, is conventionally used. The product word A Xi is stored at the location Xi for 0 Xi 2L 1, such that if an L-bit binary value of Xi is used In this project, for the reduction of look-up-table (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel antisymmetric product coding (APC) scheme is proposed to ISSN: 2231-5381 http://www.ijettjournal.org Page 3309

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 reduce the LUT size by further half, where the LUT output is added with or subtracted from a fixed value. It is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUTmultipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed LUT-multiplier and the existing ones are coded in VHDL and synthesized by Synopsys Design Compiler using TSMC 90 nanometer library. The proposed optimized LUT-multiplier is found to involve less area and less multiplication time than the existing LUT-multipliers. Table 1.1: General LUT table The proposed APC OMS combined design of the LUT for L = 5 and for any coefficient width W is shown in Fig. 3. It consists of an LUT of nine words of (W + 4)-bit width, a four-to-nine-line address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter. The precomputed values of A (2i + 1) are stored as Pi, for i = 0, 1, 2,..., 7, at the eight consecutive locations of the memory array, as specified in Table II, while 2A is stored for ISSN: 2231-5381 http://www.ijettjournal.org Page 3310

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 input X = (00000) at LUT address 1000, as specified in Table III. The decoder takes the 4- bit address from the address generator and generates nine word-select signals, i.e., {wi, for 0 i 8}, to select the referenced word from the LUT. The 4-to-9-line decoder is a simple modification of 3-to-8-line decoder, as shown in Fig. 4(a). The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations ASM chart of LUT optimization ALGORITHM: Step1: Load input multiplicand value into X register Step2: Deside whether APC or OMS technique Step3: If X(4)=1 then select APC technique Step4: Else select OMS technique APC: Step1: Take 2 s complement of X and pass to next block Step2: Calculate APC word of X Step3: If X(4)=1 then output <= 16A - APC word(x) Else Output <= 16A + APC word(x) OMS: Step1:Takes last four bits of X Step2: Calculate s0, s1 and address Step3: Depends on s0, s1 output is shifted and stored into final output Fig: Flow chart of proposed technique ISSN: 2231-5381 http://www.ijettjournal.org Page 3311

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 SIMULATION RESULT OF LUT OPTIMIZATION: RTL Internal block: APPLICATIONS: The applications of LUT optimization for memory based computation are: 1. Communications: The future wireless systems have three mutually conflicting demands, e.g., high computational-bandwith, low-power consumption and reconfigurability. Such a set of demands will continue to be a challenge to the designers of computing circuits and systems for the next generation wireless communication. The lookup-table (LUT)-based arithmetic circuits have significant potential to satisfy these requirements to a great extent. 2. This is also applicable in the DSP processors. 3. This project is also useful in FIR, FFT processors. CONCLUSION & FUTURE SCOPE: The proposed LUT-multiplier and the existing ones are coded in VHDL and synthesized by Synopsys Design Compiler using TSMC 90 nanometer library. The proposed optimized LUT-multiplier is found to involve less area and less multiplication time than the existing LUT-multipliers. Finally, combined approach provides a reduction in LUT size to one-fourth of the conventional LUT. We will design a simple technique for selective sign reversal to be used in the proposed design. In future, we are further going to reduce the power consumption that has been consumed by the proposed LUT. REFERENCES: [1] International Technology Roadmap for Semiconductors. [Online]. Available: http://public.itrs.net/ [2] J.-I. Guo, C.-M. Liu, and C.-W. Jen, The efficient memory-based VLSI array design for DFT and DCT, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 10, pp. 723 733, Oct. 1992. [3] H.-R. Lee, C.-W. Jen, and C.-M. Liu, On the design automation of the memory-based VLSI architectures for FIR filters, IEEE Trans. Consum.Electron., vol. 39, no. 3, pp. 619 629, Aug. 1993. [4] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, A systolic array architecture for the discrete sine transform, IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2347 2354, Sep. 2002. [5] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, A memory-efficient realization of cyclic convolution and its application to discrete cosine transform, IEEE Trans. ISSN: 2231-5381 http://www.ijettjournal.org Page 3312

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445 453, Mar. 2005. [6] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 6, pp. 1125 1137, Jun. 2005. [7] P. K. Meher, Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041 1050, Sep. 2006. [8] P. K. Meher, Memory-based hardware for resourceconstrained digital signal processing systems, in Proc. 6th Int. Conf. ICICS, Dec. 2007, pp. 1 4. [9] P. K. Meher, New approach to LUT implementation and accumulation for memory-based multiplication, in Proc. IEEE ISCAS, May 2009,pp. 453 456. [10] P. K. Meher, New look-up-table optimizations for memory-based multiplication, in Proc. ISIC, Dec. 2009, pp. 663 666. [11] A. K. Sharma, Advanced Semiconductor Memories: Architectures, Designs, andapplications. Piscataway, NJ: IEEE Press, 2003. [12] TSC4000 0.35m CMOS Standard Cell, Macro Library Summary, Texas Instmments, Application Specific Integrated Circuits, 1995. ISSN: 2231-5381 http://www.ijettjournal.org Page 3313