SDRAM Controller Based Vedic Multiplier in DWT Processor for Video Processing

Similar documents
LUT Optimization for Memory Based Computation using Modified OMS Technique

Architecture of Discrete Wavelet Transform Processor for Image Compression

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Optimization of memory based multiplication for LUT

Keywords- Discrete Wavelet Transform, Lifting Scheme, 5/3 Filter

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

FPGA Implementation of DA Algritm for Fir Filter

Design of Memory Based Implementation Using LUT Multiplier

Research Article VLSI Architecture Using a Modified SQRT Carry Select Adder in Image Compression

Design of Low Power Efficient Viterbi Decoder

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

VLSI IEEE Projects Titles LeMeniz Infotech

Keywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.

DDC and DUC Filters in SDR platforms

FPGA Hardware Resource Specific Optimal Design for FIR Filters

Implementation of 2-D Discrete Wavelet Transform using MATLAB and Xilinx System Generator

Comparative Analysis of Wavelet Transform and Wavelet Packet Transform for Image Compression at Decomposition Level 2

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

University of Maiduguri Faculty of Engineering Seminar Series Volume 6, december 2015

AbhijeetKhandale. H R Bhagyalakshmi

An MFA Binary Counter for Low Power Application

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

Implementation of Low Power and Area Efficient Carry Select Adder

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Design of Modified Carry Select Adder for Addition of More Than Two Numbers

Design & Simulation of 128x Interpolator Filter

Research Article Low Power 256-bit Modified Carry Select Adder

Design on CIC interpolator in Model Simulator

International Journal of Engineering Research-Online A Peer Reviewed International Journal

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

Hardware Implementation of Viterbi Decoder for Wireless Applications

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Design of VGA Controller using VHDL for LCD Display using FPGA

Towards More Efficient DSP Implementations: An Analysis into the Sources of Error in DSP Design

Memory efficient Distributed architecture LUT Design using Unified Architecture

FPGA Development for Radar, Radio-Astronomy and Communications

ALONG with the progressive device scaling, semiconductor

Design and Implementation of Encoder for (15, k) Binary BCH Code Using VHDL

Authentic Time Hardware Co-simulation of Edge Discovery for Video Processing System

Figure 1: Feature Vector Sequence Generator block diagram.

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Implementation of CRC and Viterbi algorithm on FPGA

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING

CHAPTER 8 CONCLUSION AND FUTURE SCOPE

Design of VGA and Implementing On FPGA

Implementation of an MPEG Codec on the Tilera TM 64 Processor

L12: Reconfigurable Logic Architectures

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Design and Implementation of High Speed 256-Bit Modified Square Root Carry Select Adder

An Efficient High Speed Wallace Tree Multiplier

IC Layout Design of Decoders Using DSCH and Microwind Shaik Fazia Kausar MTech, Dr.K.V.Subba Reddy Institute of Technology.

Design and Analysis of Modified Fast Compressors for MAC Unit

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

Unequal Error Protection Codes for Wavelet Image Transmission over W-CDMA, AWGN and Rayleigh Fading Channels

Motion Video Compression

Radar Signal Processing Final Report Spring Semester 2017

ENGG2410: Digital Design Lab 5: Modular Designs and Hierarchy Using VHDL

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Syed Muhammad Yasser Sherazi CURRICULUM VITAE

OMS Based LUT Optimization

An Lut Adaptive Filter Using DA

A New Wavelet Based Bio-Medical Data Compression Scheme Using FPGA

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Field Programmable Gate Arrays (FPGAs)

An Improved Recursive and Non-recursive Comb Filter for DSP Applications

[Dharani*, 4.(8): August, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Distributed Arithmetic Unit Design for Fir Filter

A Flexible FPGA communication

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Color Image Compression Using Colorization Based On Coding Technique

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

Implementation of Memory Based Multiplication Using Micro wind Software

3D MR Image Compression Techniques based on Decimated Wavelet Thresholding Scheme

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

A Novel Architecture of LUT Design Optimization for DSP Applications

SDR Implementation of Convolutional Encoder and Viterbi Decoder

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

FPGA Design. Part I - Hardware Components. Thomas Lenzi

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March-2015 ISSN DESIGN OF MB-OFDM SYSTEM USING HDL

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

A Fast Constant Coefficient Multiplier for the XC6200

LUT Optimization for Distributed Arithmetic-Based Block Least Mean Square Adaptive Filter

CHECKPOINT 2.5 FOUR PORT ARBITER AND USER INTERFACE

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

Digital Video Telemetry System

Design of BIST with Low Power Test Pattern Generator

SOC Implementation for Christmas Lighting with Pattern Display Indication RAMANDEEP SINGH 1, AKANKSHA SHARMA 2, ANKUR AGGARWAL 3, ANKIT SATIJA 4 1

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Steganographic Technique for Hiding Secret Audio in an Image

Transcription:

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 1 SDRAM Controller Based Vedic Multiplier in DWT Processor for Video Processing Prof Pramod Kumar Naik *, Prof Gurusandesh M *, Prof Arun S Tigadi **, Dr.Hansraj Guhilot *** * Department of Electronics & Communication Engineering, VCET, Puttur, Karnataka, India ** Department of Electronics & Communications, KLE DR. M.S.S CET, Belgaum, Karnataka, India *** Principal K.C.College of Engineering & Management Studies and Research, Thane, Maharashtra, India Abstract- Real time video processing has been the subject of interest for research work in last decade. Image and video processing technique are computationally demanding for various applications in various domains. Due to overwhelming demand we have focused on designing and implementing this new architecture which is effective. This paper we have focused on designing a DWT VEDIC processor which has a special SDRAM controller which takes care of this real time video processing. The design here is focused on real time DWT video compression and implementing the design on a Spartan 6 Altys FPGA board. Real time video applications have been implemented in the architecture with various results are projected to demonstrate its applicability and flexibility. control the SDRAM (Synchronous Dynamic Random Access Memory) and generates the burst signals for the remaining units of the device. The LMC controls the data path and waits for the burst signals from the MMC. The main aim of our paper is to design Synchronous Ram Controller which will help to improve the performance matrix of the Vedic multiplier which will act as one of the component inside the DWT processor. Index Terms- DWT, DCT, SDRAM. D I. INTRODUCTION iscrete wavelet transform (DWT) decomposes images into multiple sub bands of low and high frequency component. This encoding of sub band components leads to compression of image and video.image compression finds application in every discipline such as entertainment, medical, defence, industrial and commercial sectors. Thus the core of compression unit is DWT.DWT has lot of computational mathematical operations which are very intensive operations which consumes lot of time and power.our focus is on design of SDRAM controller which controls data movement in DWT computation and to increase the performance of DWT Processor we are designing and implementing a 16*16 vedic multiplier in the DWT processing unit. This architecture has greatly reduce the power consumption of the circuit and at the same time increase the speed of operation of processing unit. The rapid increase in packing density, clocking frequency and computational power of an embedded system in general has inevitably resulted in rise in power consumption.for many years to come, miniaturization of size of devices together with the search of various architectures for low power and voltage requirement will continue. The work explores the new DWT architecture with Vedic multipliers incorporated in designing the hardware and determining its power consumption. II. SYNCHRONOUS CONTROLLER The usual memory hierarchy of a FPGA includes the data path, Main Memory Controller (MMC) and Local Memory Controller (LMC) as shown in the below figure1.the MMC will Fig 3.1 Architecture of MAC Unit in a processor. III. MULTIPLY AND ACCUMULATE UNIT In most of the digital signal processing units the critical operations involved are comprised of many multiplications and accumulations. There for the key focus is to increase the speed of any digital signal processing unit. In this regard our focus is to use Vedic mathematics and design a high speed Multiplier- Accumulator Unit. These days computers contains dedicated video graphics unit similarly computers may contain dedicated MAC unit. The generalised structure of MAC unit is shown in the figure 3.1 below. The MAC unit consists of a multiplier implemented in combinational logic followed by an adder and an accumulator registers which stores when clocked. The output of the register is fed back to one input of the adder, so that on each clock, the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the shift and add based multiplier.

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 2 2N+M 2N N bit Adder Accumulator N Multiplier 2N+M bits Fig 3.1 Architecture of MAC Unit in a processor IV. 16*16 VEDIC MULTIPLIER ARCHITECTURE FOR DWT PROCESSOR The proposed Vedic multiplier is based on the Vedic multiplication formulae (Sutras). These Sutras have been traditionally used for the multiplication of two numbers in the decimal number system. In this work, we apply the same ideas to the binary number system to make the proposed algorithm compatible with the digital hardware. Vedic multiplication based on Urdhva Tiryakbhyam (Vertical & Crosswise) of ancient Indian Vedic Mathematics. Urdhva Tiryakbhyam Sutra is a general multiplication formula applicable to all cases of multiplication. like power This Vedic algorithm through which DWT video compression architecture is build using DSP Slices available in MATLAB system generator helps us to make various analysis. Some analysis performed that can be constantly checked for various parameters consumed, processing time and overall performance can be evaluated. This algorithm is implemented on a Spartan 6 FPGA. Fig4.1 Architecture of 16*16 Vedic Multiplier Architecture of a 16*16 Vedic Multiplier is structured based on building basic blocks. The individual 4*4 Vedic multiplier blocks is implemented using Verilog hardware description language. Once we have verified the functionality of 4* 4 Vedic multiplier, we have designed 8*8 Vedic multiplier. In the above architecture we have four 8*8 Vedic multipliers and three full adders. Instead of three full adders we can go to latest adders like carry save or carry select adders. Vedic multiplier has the greatest advantage as compared to other multipliers over gate delays and regularity of structures. Delay in Vedic multiplier for 16 x 16 bit number is 32 ns while the delay in Booth and Array multiplier are 37 ns and 43 ns respectively [1]. The functionality of each block is verified using Xilinx ISE 14.2 V. DIGITAL VIDEO PROCESSING Digital video processing is the technology that is almost evergreen domain of research which is also the fastest growing technology this century and, therefore, it poses tremendous challenges to the engineering community. Faster additions and multiplications are of extreme importance in DSP for convolution, discrete Fourier transforms digital filters and Discrete Wavelet Transforms etc. The core computing process is always a multiplication routine therefore DSP engineers are constantly looking for new algorithm and architecture to improve performance of the system. Using this algorithm we can constantly design and implement it on suitable hardware. INPUT VIDEO VIDEO PROCESSING BLOCK FPGA IMPLEMENTATION Fig 5.1 Block Diagram Approach OUTPUT VIDEO The block diagram shown in figure 5.1 gives the complete flow of digital video compression flow. The block diagram basically consists of an input block, processing block, output block and the entire video processing algorithm is implemented on desired FPGA implementation. Initially we have a video file which is to be processed is received from real time camera which feeds in the input video. This video is processed in the processing block. The main objective is to save the real time video for surveillance or any security measures in restricted areas. To save the video continuously it will tedious task as it consumes lot of memory space. Thus to reduce the memory space utility, we compress the real time video using DWT video compression technique. The major processing is done according to our algorithm then the video compressed will be displayed in output video file. This algorithm is designed in MATLAB Simulink which is finally implemented on a FPGA with certain modifications in the MAC unit and controlling the processing using SDRAM controller.

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 3 VI. VIDEO COMPRESSION USING DISCRETE WAVELET TRANSFORM There are several technique can be used to compress image which are Discrete Cosine Transform (DCT) and Discrete Wavelet Transform. DCT works by separating images into parts of differing frequencies. During the step quantization, where part of compression usually occurs, the less important of frequencies are discarded, hence the use of the term of lossy. Then, only the most important frequencies are used to retrieve the image compression process. As a result, the reconstruct image contains some distortion but this level of distortion can be adjusted during the compression stage. There is some loss of quality in the reconstructed image below; it is clearly recognizable, even though almost 85% of the DCT coefficients were discarded. Images contain large amounts of information that requires large transmission bandwidths, much storage space and long transmission times. Therefore it is crucial to compress the image by storing only the essential information needed to reconstruct the image. An image can be thought of as a matrix of pixel values. In order to compress the image, redundancies must be exploited, for example, areas where there is little or no change between pixel values. Therefore large redundancies occur in the images which having large area of uniform color and conversely images that have frequent and large changes in color will be less redundant and harder to compress. Images contain large amounts of information that requires large transmission bandwidths, much storage space and long transmission times. Therefore it is crucial to compress the image by storing only the essential information needed to reconstruct the image. An image can be thought of as a matrix of pixel values. In order to compress the image, redundancies must be exploited, for example, areas where there is little or no change between pixel values. Therefore large redundancies occur in the images which having large areas of uniform color, and conversely images that have frequent and large changes in color will be less redundant and harder to compress. In general, there are three essential stages in a Wavelet transform image compression system transformation, quantization and entropy coding. frequency. The signal can therefore be sub sampled by 2, simply by discarding every other sample. This constitutes 1 level of decomposition and can mathematically be expressed as Y1 [n] = k = - x[k].h[2n-k] (7.3) Y2 [n] = k = - x[k].g[2n+1-k] (7.4) Where Y1 [n] and Y2 [n] are the outputs of low pass and high pass filters, respectively after sub sampling by 2. This decomposition halves the time resolution since only half the number of sample now characterizes the whole signal. Frequency resolution has doubled because each output has half the frequency band of the input. This process is called as sub band coding. It can be repeated further to increase the frequency resolution as shown by the filter bank. Fig 7.1 Filter Bank VIII. SYNCHRONOUS RAM CONTROLLER INTERFACES As shown in the figure 8.1 is the basic interfaces of a Synchronous ram controller. VII. SUB BAND CODING A signal is passed through a series of filters to calculate DWT. Procedure starts by passing this signal sequence through a half band digital low pass filter with impulse response h(n).filtering of a signal is numerically equal to convolution of the tile signal with impulse response of the filter. x [n] * h [n] = k = - x[k].h[n-k] (7.1) A half band low pass filter removes all frequencies that are above half of the highest frequency in the tile signal. Then the signal is passed through high pass filter. The two filters are related to each other as h [L-1-n] = (-1) n g(n) (7.2) Filters satisfying this condition are known as Quadrature mirror filters. After filtering half of the samples can be eliminated since the signal now has the highest frequency as half of the original Fig 8.1 Interfaces of Synchronous Ram Controller a. Bank Buffers: Each bank buffer will intern refer to one of the banks of SDRAM. It receives signals from SDRAMC. b. Bank Scheduler: The bank scheduler will select a particular bank depending on the priorities being assigned. c. Data Handler: The read and write signals are being controlled by SDRAMC by sending appropriate signals to the data handler. d. Data Buffer: It receives two signals from SDRAMC and operates accordingly. e. SDRAM: Here the term SDRAM refers to CMOS high speed Synchronous Dynamic Random Access memory with R rows columns of B bits each. Internally it has

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 4 been organized as quad bank DRAM with synchronous interface. IX. SYNCRONOUS RAMCONTROLLER ARCHITECTURE Fig 8.2. The Synchronous RAM Controller Architecture The above Synchronous RAM Controller Architecture has following modules Flexible Logic Controller Module: The bank scheduler sets the priority and the bank buffers send the request then depending upon the priority of the bank buffer a particular bank will be selected and its number will be sent to the MSC module. The Bank State Controller (BSC) Module: The requested bank number and its actual request either read or write will be received by this module by the MSC module to start its own state machine. The Main State Controller (MSC) module: This module will receive the request either read or write the bank number and then start its own state machine and finally creates proper SDRAM signals such as CAS, RAS, WE, address lines etc Fig 9.2Simulation Results of 16*16 Vedic Multiplier X. RESULTS Fig 9.3Simulation Result of DWT Video Processing Block Fig 9.4 RTL results of DWT Video Processing Block Fig 9.1 RTL schematic Results of 16*16 Vedic Multiplier

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 5 Fig 9.7 Implementation of Algorithm on FPGA XI. CONCLUSION A real-time video processing algorithms with new architecture was implemented on FPGA. Implementation of these types algorithms on a FPGAs have lot of practical application at the same time there will be issues large memory requirement and embedded multipliers which process faster. Here in order to resolve the above complexity a new architecture with SDRAM controller helps us to maintain the large memory used for processing. We can finally conclude that implementing DWT video compressing algorithm with MAC unit compromising of 16* 16 Vedic Multiplier is implemented successfully. The algorithm created in Xilinx system generator and by using JTAG it is successfully implemented on Spartan6-LX45 FPGA board. Fig 9.5 RTL MSC schematic ACKNOWLEDGMENT I would thank, Vivekananda College of Engineering & Technology for providing the various facilities and resources available for completing this work. I would also thank KLECET, Belgaum for their support in completing this work. Fig 9.6 Simulation results of MSC simulation REFERENCES [1] Elamaran, G.Rajkumar, FPGAImplementation of point Processes Using Xilinx System Generator, July 31 2012 [2] Jharna Majumdar, Darshan K M, Abhijith Vijayendra, Design and Implementation of Video Shot Detection on Field Programmable Gate Arrays, March 2013 [3] Øyvind A. Sandberg, Jesper Toftenes, Christian Wilhelmsen, System Modeling with Simulink, May 28, 2012 [4] Kavitkar S. G., Paikrao P. L., Hardware Implementation of Edge Detection Algorithm, February, 2014 [5] R. Dutta1, S. Dutta2, K. Mitra3, Speaker Verification for Security Systems using Spartan 6, August 2012 [6] Kiranpreet Kaur, Vikram Mutenj, Inderjeet Singh Gill, Fuzzy Logic Based Image Edge Detection Algorithm in MATLAB. [7] G.T.Shrivakshan, Dr.C. Chandrashekar, A Comparision of Various Edge Detection Techniques used in Image processing, Septmeber 2012 [8] F.Arandiga, A. Cohin, R. Donat, B. Matei, Edge Detection in Sensitive to Changes of Illumination in the Image, September 15 2009 [9] Prof. Deepa Kundur, Edge Detection in Image and Video. [10] Abdoule Rjoub, Spiridon Nikolaidis, FPGA Based Canny Edge Detection for Real Time Applications [11] FPGA realization of multi-port SDRAM controller in real time image acquisition system, Multimedia Technology (ICMT), 2011 International Conference 26-28 July 2011. [12] Synthesizable High Performance SDRAM Controller: Xilinx. [13] Purushottam D. Chidgupkar and Mangesh T. Karad, The Implementation of Vedic Algorithms in Digital Signal Processing, Global J. of Engng. Educ., Vol.8, No.2 2004 UICEE Published in Australia. [14] Himanshu Thapliyal and Hamid R. Arabnia, A Time-Area- Power Efficient Multiplier and Square Architecture Based On Ancient Indian Vedic Mathematics, Department of Computer Science, The University of Georgia, 415 Graduate

International Journal of Scientific and Research Publications, Volume 5, Issue 6, June 2015 6 Studies Research Center Athens, Georgia 30602-7404, U.S.A. [15] E. Abu-Shama, M. B. Maaz, M. A. Bayoumi, A Fast and Low Power Multiplier Architecture, The Center for Advanced Computer Studies, The University of Southwestern Louisiana Lafayette, LA 70504. [16] Harpreet Singh Dhillon and Abhijit Mitra, A Reduced- Bit Multiplication Algorithm for Digital Arithmetics, International Journal of Computational and Mathematical Sciences www.waset.org Spring 2008. [17] Shamim Akhter, VHDL Implementation of Fast NXN Multiplier Based on Vedic Mathematics, Jaypee Institute of Information Technology University, Noida, 201307 UP, INDIA, 2007 IEEE. [18] Charles E. Stroud, A Designer s Guide to Built-In Self- Test, University of North Carolina at Charlotte, 2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Second Author Prof Gurusandesh M, B.E, M Tech, Vivekananda College of Engineering &Technology, Puttur. Karnataka, India.gurusandeshm.ece@vcetputtur.ac.in. Third Author- Prof Arun S Tigadi, B.E, M Tech, KLE DR. M.S.S CET, Belgaum, Karnataka, India. Fourth Author- Dr. Hansraj Guhilot Principal, K.C.College of Engineering & Management Studies and Research, Thane, Maharashtra, India. Correspondence Author -Prof Pramod Kumar Naik, B.E, M Tech, Vivekananda College of Engineering &Technology, Puttur. Karnataka, India.pramodkumarnaik.ece@vcetputtur.ac.in. Mobile: 9481772690 AUTHORS First Author Prof Pramod Kumar Naik, B.E, M Tech, Vivekananda College of Engineering &Technology, Puttur. Karnataka, India. pramodkumarnaik.ece@vcetputtur.ac.in.