International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna 2 Vignan Universit, Guntur district ABSTRACT: In this project, the anti-symmetric product coding (APC) and odd-multiple-storage (OMS) techniques for lookup-table (LUT) design for memorybased multipliers are presented to be used in digital signal processing applications. All these techniques results in the reduction of the LUT size by a factor of two. We present a different form of APC and a modified OMS scheme, in order to combine them for efficient memory-based multiplication. The proposed combined approach provides a reduction in LUT size to one-fourth of the conventional LUT. It has also suggested a simple technique for selective sign reversal to be used in the proposed design. It is shown that the proposed LUT design for small input sizes can be used for efficient implementation of high-precision multiplication by input operand decomposition. Keywords: anti-symmetric product coding, oddmultiple-storage, lookup-table, Digital signal processing INTRODUCTION: A look-up table (LUT) size of 4 is the most area efficient in a non clustered context. A LUT size of 5 to 6 gave the best performance. The work in [12] has suggested that using a heterogeneous mixture of LUT sizes of 2 and 3 was equivalent in area efficiency to a LUT size of 4 and, hence, could be a good choice. In addition, [1] states that a logic structure using two three-input LUTs was most beneficial in terms of area and speed. However, it must be noted that both these last two papers did not perform a full area or delay study where a range of LUT sizes was examined. First, prior work focused on non clustered logic blocks, which are known to have a significant impact on the area and delay [21]. Second, most prior studies tended to look at area or delay, but not both as we will here. Third, prior results were based on IC process generations that are several factors larger than current process generations, and so do not take deep-submicron electrical effects into account. In the present work, we perform detailed transistor-level design of circuits and perform appropriate buffer and transistor sizing for all the logic and routing elements. Field Programmable Gate Arrays (FPGAs) are an attractive hardware design option, making technology mapping for FPGAs an important EDA problem. For an excellent overview of the classical and recent work on FPGA technology mapping, focusing on area, delay, and power minimization, the reader is referred to [2]. The recent advanced algorithms for FPGA mapping, such as [2][12][16][23], focus on area minimization under delay constraints. If delay constraints are not given, first the optimum delay for the given logic structure is found and then area is minimized without changing delay. In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input subgraphs corresponding to LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DAOmap [2] as representative of the advanced structural technology mapping for LUT-based FPGAs and refer to it as the previous work and discuss several ways of improving it. LOOK UP TABLE: LUT means Look Up Table. It s helpful to think of it like a math problem: R= S+L R being your result or what you want to attain. S being your source or what you start with. L being your LUT or the difference needed to make up between your ISSN: 2231-5381 http://www.ijettjournal.org Page 3308
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 source and your desired outcome. In all cases of LUT use, the LUT is the means to make up the difference between source and result.((all cases assume the colorist (or you) is grading through a correctly calibrated monitor for evaluation and finishing. LUTs in no way replace proper calibration or color correction. In computer science, a lookup table is an array that replaces runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation or input/output operation. [1] The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase (memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. as the address for the LUT, then the corresponding product value A Xi is available as its output. Let input be X, and it should be multiplied with A. The products are as shown in second column of above table. In our design product values are stored in LUT S. Each product value is stored in separate row. For the selection of product value, input data is acts as a address. If the input size is of length 5 then 2 5 values are to be stored. If the input length increases more number of data is to be stored and it requires more memory. PROPOSED TECHNIQUE: Present technique: LUT optimization is the main key factor in our project, in order to reduce power and area. The following techniques have to be implemented in LUT to get exact optimized results. 1. Anti symmetric Product coding (A.P.C) 2. Modified Odd multiple storage (O.M.S) A conventional lookup-table (LUT)-based multiplier is shown in Fig. 1, where A is a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be 2L possible values of X, and accordingly, there can be 2L possible values of product C = A X. Therefore, for memory-based multiplication, an LUT of 2L words, consisting of precomputed. Fig 1: Conventional LUTbased multiplier product values corresponding to all possible values of X, is conventionally used. The product word A Xi is stored at the location Xi for 0 Xi 2L 1, such that if an L-bit binary value of Xi is used In this project, for the reduction of look-up-table (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel antisymmetric product coding (APC) scheme is proposed to ISSN: 2231-5381 http://www.ijettjournal.org Page 3309
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 reduce the LUT size by further half, where the LUT output is added with or subtracted from a fixed value. It is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUTmultipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed LUT-multiplier and the existing ones are coded in VHDL and synthesized by Synopsys Design Compiler using TSMC 90 nanometer library. The proposed optimized LUT-multiplier is found to involve less area and less multiplication time than the existing LUT-multipliers. Table 1.1: General LUT table The proposed APC OMS combined design of the LUT for L = 5 and for any coefficient width W is shown in Fig. 3. It consists of an LUT of nine words of (W + 4)-bit width, a four-to-nine-line address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter. The precomputed values of A (2i + 1) are stored as Pi, for i = 0, 1, 2,..., 7, at the eight consecutive locations of the memory array, as specified in Table II, while 2A is stored for ISSN: 2231-5381 http://www.ijettjournal.org Page 3310
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 input X = (00000) at LUT address 1000, as specified in Table III. The decoder takes the 4- bit address from the address generator and generates nine word-select signals, i.e., {wi, for 0 i 8}, to select the referenced word from the LUT. The 4-to-9-line decoder is a simple modification of 3-to-8-line decoder, as shown in Fig. 4(a). The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations ASM chart of LUT optimization ALGORITHM: Step1: Load input multiplicand value into X register Step2: Deside whether APC or OMS technique Step3: If X(4)=1 then select APC technique Step4: Else select OMS technique APC: Step1: Take 2 s complement of X and pass to next block Step2: Calculate APC word of X Step3: If X(4)=1 then output <= 16A - APC word(x) Else Output <= 16A + APC word(x) OMS: Step1:Takes last four bits of X Step2: Calculate s0, s1 and address Step3: Depends on s0, s1 output is shifted and stored into final output Fig: Flow chart of proposed technique ISSN: 2231-5381 http://www.ijettjournal.org Page 3311
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 SIMULATION RESULT OF LUT OPTIMIZATION: RTL Internal block: APPLICATIONS: The applications of LUT optimization for memory based computation are: 1. Communications: The future wireless systems have three mutually conflicting demands, e.g., high computational-bandwith, low-power consumption and reconfigurability. Such a set of demands will continue to be a challenge to the designers of computing circuits and systems for the next generation wireless communication. The lookup-table (LUT)-based arithmetic circuits have significant potential to satisfy these requirements to a great extent. 2. This is also applicable in the DSP processors. 3. This project is also useful in FIR, FFT processors. CONCLUSION & FUTURE SCOPE: The proposed LUT-multiplier and the existing ones are coded in VHDL and synthesized by Synopsys Design Compiler using TSMC 90 nanometer library. The proposed optimized LUT-multiplier is found to involve less area and less multiplication time than the existing LUT-multipliers. Finally, combined approach provides a reduction in LUT size to one-fourth of the conventional LUT. We will design a simple technique for selective sign reversal to be used in the proposed design. In future, we are further going to reduce the power consumption that has been consumed by the proposed LUT. REFERENCES: [1] International Technology Roadmap for Semiconductors. [Online]. Available: http://public.itrs.net/ [2] J.-I. Guo, C.-M. Liu, and C.-W. Jen, The efficient memory-based VLSI array design for DFT and DCT, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 10, pp. 723 733, Oct. 1992. [3] H.-R. Lee, C.-W. Jen, and C.-M. Liu, On the design automation of the memory-based VLSI architectures for FIR filters, IEEE Trans. Consum.Electron., vol. 39, no. 3, pp. 619 629, Aug. 1993. [4] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, A systolic array architecture for the discrete sine transform, IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2347 2354, Sep. 2002. [5] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, A memory-efficient realization of cyclic convolution and its application to discrete cosine transform, IEEE Trans. ISSN: 2231-5381 http://www.ijettjournal.org Page 3312
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Circuits Syst. Video Technol., vol. 15, no. 3, pp. 445 453, Mar. 2005. [6] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 6, pp. 1125 1137, Jun. 2005. [7] P. K. Meher, Systolic designs for DCT using a lowcomplexity concurrent convolutional formulation, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041 1050, Sep. 2006. [8] P. K. Meher, Memory-based hardware for resourceconstrained digital signal processing systems, in Proc. 6th Int. Conf. ICICS, Dec. 2007, pp. 1 4. [9] P. K. Meher, New approach to LUT implementation and accumulation for memory-based multiplication, in Proc. IEEE ISCAS, May 2009,pp. 453 456. [10] P. K. Meher, New look-up-table optimizations for memory-based multiplication, in Proc. ISIC, Dec. 2009, pp. 663 666. [11] A. K. Sharma, Advanced Semiconductor Memories: Architectures, Designs, andapplications. Piscataway, NJ: IEEE Press, 2003. [12] TSC4000 0.35m CMOS Standard Cell, Macro Library Summary, Texas Instmments, Application Specific Integrated Circuits, 1995. ISSN: 2231-5381 http://www.ijettjournal.org Page 3313