An Efficient High Speed Wallace Tree Multiplier

Similar documents
An Efficient Carry Select Adder

An MFA Binary Counter for Low Power Application

Design and Analysis of Modified Fast Compressors for MAC Unit

Implementation of Low Power and Area Efficient Carry Select Adder

Design of Modified Carry Select Adder for Addition of More Than Two Numbers

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

ISSN:

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

COMPUTATIONAL REDUCTION LOGIC FOR ADDERS

Implementation of High Speed Adder using DLATCH

DESIGN OF LOW POWER AND HIGH SPEED BEC 2248 EFFICIENT NOVEL CARRY SELECT ADDER

Implementation of Memory Based Multiplication Using Micro wind Software

Design of Memory Based Implementation Using LUT Multiplier

The main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest

L11/12: Reconfigurable Logic Architectures

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

LUT Optimization for Memory Based Computation using Modified OMS Technique

L12: Reconfigurable Logic Architectures

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

Modified128 bit CSLA For Effective Area and Speed

ALONG with the progressive device scaling, semiconductor

DESIGN OF HIGH PERFORMANCE, AREA EFFICIENT FIR FILTER USING CARRY SELECT ADDER

Design of Carry Select Adder using Binary to Excess-3 Converter in VHDL

Optimization of memory based multiplication for LUT

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Design and Implementation of Low-Power and Area-Efficient for Carry Select Adder (Csla)

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Midterm Exam 15 points total. March 28, 2011

128 BIT MODIFIED CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER

Logic Design II (17.342) Spring Lecture Outline

CHAPTER 4 RESULTS & DISCUSSION

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Improved 32 bit carry select adder for low area and low power

FPGA Implementation of DA Algritm for Fir Filter

A Novel Architecture of LUT Design Optimization for DSP Applications

FPGA IMPEMENTATION OF LOW POWER AND AREA EFFICIENT CARRY SELECT ADDER

Microprocessor Design

Research Article Low Power 256-bit Modified Carry Select Adder

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

A Fast Constant Coefficient Multiplier for the XC6200

High Performance Carry Chains for FPGAs

VLSI IEEE Projects Titles LeMeniz Infotech

FPGA Implementation of Low Power and Area Efficient Carry Select Adder

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Why FPGAs? FPGA Overview. Why FPGAs?

An Efficient Reduction of Area in Multistandard Transform Core

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

Research Article VLSI Architecture Using a Modified SQRT Carry Select Adder in Image Compression

Memory efficient Distributed architecture LUT Design using Unified Architecture

Distributed Arithmetic Unit Design for Fir Filter

Efficient Method for Look-Up-Table Design in Memory Based Fir Filters

Adaptive Fir Filter with Optimised Area and Power using Modified Inner-Product Block

TEST PATTERN GENERATION USING PSEUDORANDOM BIST

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Design and FPGA Implementation of 100Gbit/s Scrambler Architectures for OTN Protocol Chethan Kumar M 1, Praveen Kumar Y G 2, Dr. M. Z. Kurian 3.

A Low Power Delay Buffer Using Gated Driver Tree

A New Family of High-Performance Parallel Decimal Multipliers*

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

1. Convert the decimal number to binary, octal, and hexadecimal.

THE USE OF forward error correction (FEC) in optical networks

Implementation of efficient carry select adder on FPGA

IN DIGITAL transmission systems, there are always scramblers

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

Section 6.8 Synthesis of Sequential Logic Page 1 of 8

A Power Efficient Flip Flop by using 90nm Technology

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

Optimizing area of local routing network by reconfiguring look up tables (LUTs)

Design of Fault Coverage Test Pattern Generator Using LFSR

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

OMS Based LUT Optimization

An Lut Adaptive Filter Using DA

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Low Power and Area Efficient 256-bit Shift Register based on Pulsed Latches

Implementation of Dynamic RAMs with clock gating circuits using Verilog HDL

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Design of Low Power Efficient Viterbi Decoder

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

An FPGA Implementation of Shift Register Using Pulsed Latches

problem maximum score 1 28pts 2 10pts 3 10pts 4 15pts 5 14pts 6 12pts 7 11pts total 100pts

FPGA Hardware Resource Specific Optimal Design for FIR Filters

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

COE328 Course Outline. Fall 2007

Field Programmable Gate Arrays (FPGAs)

Fully Pipelined High Speed SB and MC of AES Based on FPGA

Aging Aware Multiplier with AHL using FPGA

COMP2611: Computer Organization. Introduction to Digital Logic

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Transcription:

Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace tree multiplier is considered as faster than a simple array multiplier and is an efficient implementation of a digital circuit which is multiplies two integers. A Wallace tree multiplier is a parallel multiplier which uses the carry save addition algorithm to reduce the latency. There are many researchers have been worked on the design of increasingly more efficient multipliers. They aim at achieve higher speed and lower power consumption even while occupying reduced silicon area. The Wallace tree basically multiplies two unsigned integers. The new architecture enhances the speed performance of the widely acknowledged WTM. I.Introduction: A multiplier can be divided into further three stages: - Partial products generation (PPG) stage is the first stage in which the multiplicand and the multiplier are multiplied bit by bit to generate the partial products. Partial products addition stage or reduction of partial products (PPR) is the second stage which is the most important as it is the most complicated and that determines the speed of the overall multiplier and the final addition stage or carry-propagate addition (CPA) using different compressors have been widely employed in the high speed multipliers to lower the latency of the partial product accumulation stage. In order to employ the processor for digital signal processing applications, a modified Wallace tree multiplier which uses compressors circuits to obtain low power and high speed operation in the Arithmetic Logic Unit (ALU). In digital CMOS design, the well-known power-delay product is commonly used to evaluate the value the merits of designs. PARTIAL PRODUCT USING COMPRESSOR The multiplier architecture comprises of a partial product generation stage, partial product reduction stage and the final addition stage. In the proposed architecture, multi bit compressors are used for realizing the reduction in the number of partial product addition stages. The combined factors of low power, low transistor count and minimum delay makes the 3:2,4:2 and 5:2 compressors, the appropriate choice. In these compressors, the outputs generated at each stage are efficiently used by replacing the XOR blocks with multiplexer blocks.the select bits to the multiplexers are available much ahead of the inputs so that the critical path delay is minimized. The multiplier architecture proposed comprises of a partial product generation stage (PPG), partial product reduction stage (PPR) and the final addition stage or carry-propagate addition (CPA). In the partial products reduction stage the latency of the Wallace tree multiplier can be reduced by decreasing the number of adders. Realization and reduction in the number of partial product addition stages are using multi-bit compressors. 3:2 Compressor The 3:2 Compressor is a combinational circuit which sum up three binary inputs of one bit and returns sum and carry of one bit. Chepuri satish UG Students[ B.Tech,],Panem charan Arur(M.tech,Assistant Professor In ECE Dept..PINN College, Nellore ),G.Kishore Kumar (M.Tech. Assistant Professor Dept Of ECE Mekapati Rajamohanreddy Institute Of Technology And Science. Nellore,India). G.Mamatha UG Students[ B.Tech,]. Figure 1: A 3:2 Compressor by using universal gates It is used as a full adder. Figure1 shows block diagram of the 3:2 compressor, it has three inputs

Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 39 X 1, X 2 and X 3 and two outputs sum and carry. The equations governing the outputs of the 3:2 compressor architecture is shown below. Sum= X 1 X 2 X 3 Carry=X 1 X 2 +X 2 X 3 +X 3 X 1 Table.1 has three 1-bit inputs name as X 1, X 2 and X 3 and two outputs sum and carry. For example a binary input of 001 results in an output of 0 + 0 + 1= 10. Here 1he sum represents bit one, Sum = 1 while the carry-out represents carry = 0 of the results. 4.2.2 4:2 Compressor The 4:2 compressors has 4 inputs X1, X2, X3 and X4 and 2 outputs Sum and Carry along with a Carry-in (Cin) and a Carry-out (Cout). Double pass transistor logic (DPL) implementation of the gate logic structure shown above has been shown to exhibit lower power consumption and higher speed performance compared to earlier designs due to its reduction of the internal load capacitances in the critical path. The use of transmission gate multiplexer in the construction of compressors further reduces the number of transistors to 8 which would have been 12 in the case of conventional CMOS multiplexer. On the other hand, the use of a 4:2 compressor reduces the latency to 3. Hence, two full adders can be replaced by a single 4:2 compressor. The equations governing the outputs of the 4:2 compressor architecture is shown below. 4.2.3 5:2 Compressor The 5:2 Compressor is a combinational circuit which sum up five binary inputs of one bit and returns sum and carry of one bit. Figure 4 : Block diagram of 5:2 compressor using two 3:2 compressors Figure 2 :A 4:2 Compressor (DPL logic) The block diagram of 4:2 compressor is composed of two serially connected 3:2 Compressors as shown in figure 2 (a) and (b). First 3:2 compressor has three inputs x 1, x 2 and x 3, the sum and carry outputs of this compressor are s 1 and Carry1. Second 3:2 compressor has inputs S 1, X 4 and 0 as third input, produces Sum and Carry2. Figure 3 universal gates. : A 4:2 Compressor with MULTIPLICATION LOGIC Considering an example of 8 bit multiplication in which 8 bit input is X 7 X 6 X 5 X 4 X 3 X 2 X 1 X 0 and multiplier is Y 7 Y 6 Y 5 Y 4 Y 3 Y 2 Y 1 Y 0. The multiplication process is shown in figure 4(a). There is the requirement of 64 AND logics. First Y 0 is multiplied with X 7 X 6 X 5 X 4 X 3 X 2 X 1 X 0 and results X 0 Y 0, X I Y 0, X 2 Y 0, X 3 Y 0, X 4 Y 0, X 5 Y 0, X 6 Y 0 and X 7 Y 0. After it Y 1 is multiplied with X 7 X 6 X 5 X 4 X 3 X 2 X 1 X 0 and results X 0 Y 1, X I Y 1, X 2 Y 1, X 3 Y 1, X 4 Y 1, X 5 Y 1, X 6 Y 1 and X 7 Y 1. Similarly all multiplications are taken place. In each step there is one binary shift in the resultant logic. All AND logics are represented by one bit representation starting from K 0 to K 63 sequentially as shown in figure 4(b). After completion of 64 AND logic there is an additive process which is shown in figure 4(c). The addition can be done using a tree formed

Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 40 itself. This is done using 3:2 compressor, 4:2 compressor and 5:2 compressor which are the optimized solutions instead of using 3:2 compressors only. This addition is possible using 3:2 compressors only but the implementation using 4:2 and 5:2 reduces the latency and increases the speed. The addition is shown using Wallace tree shown in figure. In the process the sum output of intermediate compressors is the input for next compressors in the same column and the generated carry for the corresponding adders are propagated to next column adders. The result will be of 16 bits represented by [P 15.P 0 ]. common counter used is the 3:2 counter which is a Full Adder.. The final results are added using usually carry propagate adder. The advantage of Wallace tree is speed because the addition of partial products is now O (log N). A block diagram of 4 bit Wallace Tree multiplier is shown in below. As seen from the block diagram partial products are added in Wallace tree block. The result of these additions is the final product bits and sum and carry bits which are added in the final fast adder (CRA). Figure 5. Multiplier (8 bits) Figure 7: Wallace Tree Multiplier using optimized Compressor Figure 6. Multiplication wallace Logic tree Several popular and well-known schemes, with the objective of improving the speed of the parallel multiplier, have been developed in past. Wallace introduced a very important iterative realization of parallel multiplier. This advantage becomes more pronounced for multipliers of bigger than 16 bits. In Wallace tree architecture, all the bits of all of the partial products in each column are added together by a set of counters in parallel without propagating any carries. Another set of counters then reduces this new matrix and so on, until a two-row matrix is generated. The most Figure 8 : Wallace Tree using 3:2 compressors The 3:2 compressors make use of a carry save adder.the carry save adder outputs two numbers of the same dimensions as the inputs, one is a sequence of partial sum bits and other is a sequence of carry bits. In carry save adder, the

Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 41 carry digit is taken from the right and passed to the left, just as in conventional addition; but the carry digit passed to the left is the result of the previous calculation and not the current one. performance measure. Serial Parallel multiplier is a best choice when speed is not important but reduced area and power consumption is of more interest and also for AP and AT product Serial Parallel multiplier is a good choice. However, one of the most important performance parameter is AT2. From the table we see that Modified BoothdWallace Tree multiplier is the best choice as far as AT2 is concerned. The Serial Parallel multiplier which is a good choice for AP and AT product has worst performance for AT2. IMPLEMENTATION RESULTS: Figure. 22: Wallace Tree using 4:2 compressors So in each clock cycle, carries only have to move one step along and the clock can tick much faster. Also the carry-save adder produces all of its output values in parallel, and thus has the same delay as a single full-adder. The 4:2 compressors have been widely employed in the high speed multipliers to lower the latency of the partial product accumulation stage. A 4:2compressor can be built using two 3:2 compressors. Owing to its regular interconnection, the 4:2 compressors is ideal for the construction of regularly structured Wallace Tree with low complexity. In this section performance measures of multipliers discussed so far are summarized and compared. These results were obtained after synthesizing individual architectures targeting Xilinx FPGA 4052XL-1HQ240C. All comparisons are based on the synthesis reports keeping one common base for comparison. We summarize Area (Total number of CLBs required), Delay and Power Consumption and also calculate Delay Power (DP), Area Power (AP), Area Speed (AT) and Area Speed2 (AT2) product. From the Table we can see that delay of Wallace tree multiplier and Combined Booth-Wallace tree multiplier is almost the same and is the least. Hence they are fastest among five multipliers. DP product is also the least for the above two multiplier and are a good choice for this Figure 9. 8 bit multipler Device utilization summary is the report of used device hardware in the implementation of the chip such as RAM, ROM, slices, flip flops etc. Synthesis report shows the complete details of device utilization as total memory utilization. Timing details provides the information of net delay, minimum period, minimum input arrival time before clock and maximum output required time after clock. Selected Device is 3s250epq208-5. It is the FPGA device at which designed is targeted CONCLUSTION The simulation and synthesis of multiplier is done in Xilinx ISM 14.2 and functionally tested in Modelsim with different test cases. The implementation is followed using 3:2 compressor, 4:2 compressor, 5:2 compressors. The same implementation could be done using 3:2 compressors only but we optimized the multiplier design using 4:2 and 5:3 compressors also. Due to which the combinational path delay is 14.263 ns and memory utilization is found 144492 kilobytes. In future we can enhance the performance of multiplier by the synthesis on Virtex-7 FPGA and implementation of N bits multiplier.

Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 42 The results prove that the proposed architecture is more efficient than the existing one in terms of delay. This approach may be well suited for multiplication of numbers with more than 16 bit size for high speed applications. The power of the proposed multiplier can be explored to implement high performance multiplier in VLSI applications. Wallace tree multiplier using booth algorithm is very a good technique for high speed applications, its implementation with different logics in VLSI. Further the work can be extended for optimization of said multiplier to improve the power. Chepuri satish Studying B.Tech(ECE) at Mekapati raja mohan reddy Institute of Technology & scince,udayagiri,spsr NelloreAP,India. Email: 403sat@gmail.com. REFERENCE [1]. Perneti Balasreekanth Reddy and V. S. Kanchana Bhaaskaran, Design of Adiabatic Tree Adder Structures for Low Power, International Conference on Embedded Systems (ICES 2010) organized by CIT, Coimbatore and Oklohoma State University, 14-16 July 2010. [2]. K. Prasad and K. K. Parhi, Low-power 4-2 and 5-2 compressors, in Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers, 2001, Vol. 1, pp. 129 133 [3]. C. S. Wallace, A suggestion for a fast multiplier, IEEE Trans. on Electronic Comp. EC-13(1): 14-17 (1964) [4]. Veech engineering [5]. Sreehari Veeramachaneni, Kirthi M, Krishna Lingamneni Avinash Sreekanth Reddy Puppala M.B. Srinivas, Novel Architectures for High-Speed and Low-Power 3-2, 4-2 and 5-2 Compressors, 20th International Conference on VLSI Design, Jan 2007, Pp. 324-329. Panem Charan Arur.He did M.Tech (VLSI System Design) and B.Tech(ECE).Now working as a Assistant Professor in ECE department at Priyadarshini Institute of Technology(PINN),SPSR Nellore,AP,India.Doing Research Work on Low Power VLSI. Published Three InterNational Journal,Attended one InterNational conference and Three national level conference and two national level technical seminars,two national level workshops.professional Association member ships IAENG,CSIT,IACSIT. He has a review committee member in three International Journals.Now he doing research on advanced technologies in VLSI and Embedded systems. Email:panem.charan@gmail.com. G.Mamatha Studying B.Tech(ECE) at Mekapati raja mohan reddy Institute of Technology & scince,udayagiri,spsr NelloreAP,India. Email:mamathaambati77@gmail.com G.kishore kumar.he did M.Tech (Embided System Design) and B.Tech(EIE).Now working as a Assistant Professor in ECE department at Mekapati raja mohan reddy Institute of Technology & scince,udayagiri,spsr NelloreAP,India.Doing Research Work on Low Power VLSI. Published Three InterNational Journal,Attended one InterNational conference and Three national level conference and two national level technical seminars,8 national level workshops.professional. Email : gkishore1303@gmail.com