Audio Compression Technology for Voice Transmission

Similar documents
Digital Representation

LabView Exercises: Part II

Lesson 2.2: Digitizing and Packetizing Voice. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations

DIGITAL COMMUNICATION

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Digital Audio and Video Fidelity. Ken Wacks, Ph.D.

Recording of Coincidence Signals in a Software Medium

ECG SIGNAL COMPRESSION BASED ON FRACTALS AND RLE

Bit Rate Control for Video Transmission Over Wireless Networks

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Understanding Compression Technologies for HD and Megapixel Surveillance

Design Project: Designing a Viterbi Decoder (PART I)

The Effect of Time-Domain Interpolation on Response Spectral Calculations. David M. Boore

Adaptive Key Frame Selection for Efficient Video Coding

The reduction in the number of flip-flops in a sequential circuit is referred to as the state-reduction problem.

Exercise 1-2. Digital Trunk Interface EXERCISE OBJECTIVE

Vocoder Reference Test TELECOMMUNICATIONS INDUSTRY ASSOCIATION

Metastability Analysis of Synchronizer

Pattern Smoothing for Compressed Video Transmission

COSC3213W04 Exercise Set 2 - Solutions

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

BER MEASUREMENT IN THE NOISY CHANNEL

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Advanced Data Structures and Algorithms

Module 4: Video Sampling Rate Conversion Lecture 25: Scan rate doubling, Standards conversion. The Lecture Contains: Algorithm 1: Algorithm 2:

Introduction to Computers and Programming

Department of Communication Engineering Digital Communication Systems Lab CME 313-Lab

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

PCM ENCODING PREPARATION... 2 PCM the PCM ENCODER module... 4

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

An Overview of Video Coding Algorithms

Communication Lab. Assignment On. Bi-Phase Code and Integrate-and-Dump (DC 7) MSc Telecommunications and Computer Networks Engineering

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

SDR Implementation of Convolutional Encoder and Viterbi Decoder

Link download full: Test Bank for Business Data Communications Infrastructure Networking and Security 7th Edition by William

Microbolometer based infrared cameras PYROVIEW with Fast Ethernet interface

Experiment 7: Bit Error Rate (BER) Measurement in the Noisy Channel

Data Storage and Manipulation

News from Rohde&Schwarz Number 195 (2008/I)

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial

Multirate Digital Signal Processing

CZT vs FFT: Flexibility vs Speed. Abstract

MULTIMEDIA TECHNOLOGIES

Crash Course in Digital Signal Processing

Design of Fault Coverage Test Pattern Generator Using LFSR

DWT Based-Video Compression Using (4SS) Matching Algorithm

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

CESR BPM System Calibration

How Does H.264 Work? SALIENT SYSTEMS WHITE PAPER. Understanding video compression with a focus on H.264

Measuring Radio Network Performance

Chapt er 3 Data Representation

HDMI Demystified April 2011

UTILIZATION OF MATLAB FOR THE DIGITAL SIGNAL TRANSMISSION SIMULATION AND ANALYSIS IN DTV AND DVB AREA. Tomáš Kratochvíl

Lab 2 Part 1 assigned for lab sessions this week

Acoustic Echo Canceling: Echo Equality Index

8/30/2010. Chapter 1: Data Storage. Bits and Bit Patterns. Boolean Operations. Gates. The Boolean operations AND, OR, and XOR (exclusive or)

Iterative Direct DPD White Paper

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Lab experience 1: Introduction to LabView

RF (Wireless) Fundamentals 1- Day Seminar

Robert Alexandru Dobre, Cristian Negrescu

PulseCounter Neutron & Gamma Spectrometry Software Manual

VIDEO GRABBER. DisplayPort. User Manual

1 Introduction to PSQM

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Chapter 1. Introduction to Digital Signal Processing

4 Anatomy of a digital camcorder

Chapter 14 D-A and A-D Conversion

BASE-LINE WANDER & LINE CODING

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

Digital Media. Daniel Fuller ITEC 2110

Pre-processing of revolution speed data in ArtemiS SUITE 1

DISTRIBUTION STATEMENT A 7001Ö

2.1 Introduction. [ Team LiB ] [ Team LiB ] 1 of 1 4/16/12 11:10 AM

INTERNATIONAL TELECOMMUNICATION UNION GENERAL ASPECTS OF DIGITAL TRANSMISSION SYSTEMS PULSE CODE MODULATION (PCM) OF VOICE FREQUENCIES

Lab 5 Linear Predictive Coding

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

UNIVERSITY OF BAHRAIN COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING


UC San Diego UC San Diego Previously Published Works

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

Contents Circuits... 1

An Introduction to the Spectral Dynamics Rotating Machinery Analysis (RMA) package For PUMA and COUGAR

Source/Receiver (SR) Setup

Experiment 13 Sampling and reconstruction

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Pivoting Object Tracking System

ATSC Standard: Video Watermark Emission (A/335)

Analogue Versus Digital [5 M]

Pole Zero Correction using OBSPY and PSN Data

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Analysis of Video Transmission over Lossy Channels

Introduction to Data Conversion and Processing

Transcription:

Audio Compression Technology for Voice Transmission 1 SUBRATA SAHA, 2 VIKRAM REDDY 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Manitoba Winnipeg, Manitoba CANADA 1 Abstract:- Digitized voice is transmitted during different sorts of communications. For transmitting voice first the analog voice message is sampled and converted into digital signal. Then the signal is encoded and finally transmitted. In order to minimize the traffic over network, voice message is compressed during transmission. Compression and decompression process should not take much time. Again in cellular technology the compression and decompression need to be implemented in hardware level. If they require a complex hardware, that may not be effective. In this paper a very simple, linear, effective and easy to implement compression and decompression technique has been proposed. Our proposed technique keeps track of change in the digitized voice. Considering the digitized signal as a graph of amplitude vs. time, it keeps track of the change in direction of the wave. The proposed technique is not a loss-less compression scheme and it introduces a very little noise within acceptance range. Keywords:- Signal wave, Sharp edge, PCX-compression. 1 Introduction Voice transfer plays a major role in today s communication. Voice, in form of digital data is transmitted from one node to another node over network. Voice transfer is necessary in many sorts of communications like internet-telephony using voice over IP, cellular telephone, different popular messengers like Yahoo messenger, msn-messenger, online conference, online radio service and many other technologies. In any sort of voice transmissio n first the analog voice message is sampled and thus converted from analog signal to digital signal. Then the digitized signal is encoded and finally transmitted. The quality of service depends on the data transmission rate during ongoing service. Large amount of traffic keeps bad effect on the quality of service. In order to minimize the traffic the digitized voice message is compressed. The compressed digitized voice message is then transmitted. At the receiver end the compressed signal is received and then it is decompressed. Sender performs compression. Decompression is performed in receiver end. Amount of traffic on the network is inversely related to the amount of compression done. Obviously highly strong compression scheme is preferable because it minimizes the traffic and thus helps the signal to be transmitted in quickly. But algorithms, those ensure high compression, take much time during compression and decompression. Taking much time for compressing and decompressing digitized voice message introduces delay in ongoing voice transmission. In cellular phone extra hardware is added for compression and decompression. This hardware should be very simple and easy for implementation. If the algorithm is too complex, the required hardware may also be complex. So a very simple algorithm is needed. Audio signal can be segmented in different ways. Signal can be encoded further depending on the segments. Segmentation using Bayesian changepoint detection [5] can be applied for detecting sudden change in signal. Our method also detects changes in signal, but it detects the change at the magnitude level, not at the frequency level. 1.1 Problem Definition We consider the problem of encoding the signal after sampling. In existing techniques the voice message is sampled on each small time interval and the sampled signal (data) is encoded. We introduce a new method for encoding. Our algorithm compresses the signal up to a significant

level. The complexity of compression and decompression in our technique is very less. Our method is very straight forward and thus very easy to implement. 1.2 Paper Organization The remaining of the paper is organized as follows. In section 2 some related compression techniques have been discussed. Section 3 introduces our technique. Section 4 shows the analysis of performance of our technique. Section 5 consists of conclusion and some future works on this method. 2 Some Related Works Audio signal encoding has been challenge for many years. A large number of methods can be found for signal segmentation. Mainly the segmentation is based on searching change-points detection using suitable signal parameters. Many reliable methods are based on maximum likelihood and Bayesian approach [2][3]. Bayesian detectors are very effective because they remove nuisance parameters from the analysis by a marginalization process. RLE or Run-length encoding [1][6] is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs; for example, simple graphic images such as icons and line drawings. Data that has long sequential runs of bytes (such as lower-quality sound samples) could be RLE compressed after Delta encoding [7] is applied to it. Delta encoding is a way of storing data in form of differences (deltas) between sequential data rather then data themselves. It is sometimes called delta compression because some instances of the encoding can make encoded data shorter then non-encoded data. Delta modulation [4] is used for transmission. Analog-to-digital signal conversion in which (a) the analog signal is approximated with a series of segments, (b) each segment of the approximated signal is compared to the original analog wave to determine the increase or decrease in relative amplitude, (c) the decision process for establishing the state of successive bits is determined by this comparison, and (d) only the change of information is sent, i.e., only an increase or decrease of the signal amplitude from the previous sample is sent whereas a no change condition causes the modulated signal to remain at the same 0 or 1 state of the previous sample. PCX [8] compression is one form of Run-length encoding. This compression is used as a format of saving pictures. If bitmap pictures are stored in PCX compression format it takes much less space. All the techniques mentioned above are loss-less compression. That is after decompressing the encoded signal, the original data (signal) is found. Our proposed method is a lossy compression scheme. 3 Encoding Voice Message When a voice signal is sampled and digitized, if we plot the signal it looks like a graph of amplitude vs. time. Figure 1 shows such a graph. It is the representation of a simple voice signal of duration 0.058 second recorded at 11025 Hz. If we analyze the signal carefully we see that the amplitude of the signal varies over time. Sometimes it increases with time, sometimes it decreases, sometimes it remains same. We can define three runs for signal. They are a) gradually increasing, b) gradually decreasing and c) running same. Figure 2 shows the 3 runs. Figure 2 is an enlarged and partial view of figure 1. From point a to point b the signal is on increasing run. From b to c it is on same run. And from c to d it is on decreasing run. Our method encodes signal with this concept. 3.1 Introducing Our Method Our method detects the above three runs and extracts only the end points of each run. Thus the encoded message is the combination of the end points of the separated runs in the original signal. When the

encoded signal is decoded we get straight lines for each runs in the original signal. As an example, if we encode the signal shown in figure 2, the portion a-bc-d of the signal will be replaced by three straight lines (1 from a to b, 1 from b to c and 1 from c to d). Figure 1 and figure 2 are drawn for a signal recorded at 11025 Hz. The portion a to d consists of 61 samples. (Figure 1 has been drawn with 640 samples). So 61 bytes are necessary to store the a-bc-d portion. For this portion our method will save the following information sequentially: amplitude of point a, number of samples between a and b, amplitude of point b, number of samples between b and c, amplitude of point c, number of samples between c and d, amplitude of point d. Our method will save only 7 items and thus takes 7 bytes to save this portion of the signal. In this method the smoothness of the original signal is ignored. But as the voice is recorded at high frequency, the amount of deviation is very little. Finally the decoded signal is lightly distorted. Figure 3 shows the amount of distortion. Figure 3a shows the a-b-c-d portion of the original signal. Figure 3b shows that portion if encoded by our method and figure 3c shows the superimposition of the signal got by our method on the original signal. The grayed portion expresses the amount of distortion. In the encoded signal we store only the end points of three types runs. During decoding we have to construct the signal from the end points only. Such as in figure 3 there are n-1 samples between point a and point b, i.e. point b is nth sample from point a (as we have collected data and drawn graph there are 22 samples between point a and point b). In the encoded stream only magnitude of a, magnitude of b and n are stored. We need to calculate all n-1 points during decoding and thus construct the signal. Since all individual runs of the original signal will be replaced by straight lines, the magnitude of ith point ( 0 < i < n ) between a and b will be (mb-ma) * i / n}, where ma and mb are magnitudes of points a and b respectively. 3.2 The Algorithm In this paper we present the complete encoding and decoding techniques. We present the algorithms for encoding the original signal and then for decoding the encoded signal. We consider each sample as an 8-bit data. 3.2.1 Algorithm Encode Here GetNextSample() is a function that samples the voice message and returns the sampled value. Input: The original signal stream, i.e. sampled voice message. Output: Encoded signal. Procedure Encode ( ) define SAME = 0 define INCREASING = 1 define DECREASING = 2 variables: v1, v2 : BYTE status : BYTE encoded_stream : Array of BYTE i, n : integer i = 0 v1 = GetNextSample( ) // store the first sample encoded_stream[i] = v1 v2 = GetNextSample( ) // initialize first run if ( v2 > v1 ) status = INCREASING else if ( v2 < v1 ) status = DECREASING else status = SAME // initialization complete n = 1 while ( message not end ) v2 = GetNextSample( ) if (( status = INCREASING and v2 > v1 ) or ( status = DECREASING and v2 < v1 ) or ( status = SAME and v2 = v1 )) // on the same run n = n + 1 } else

// the run ends. save it and start next run i = i + 1 encoded_stream[i] = n // store the number of // samples on the run i = i + 1 encoded_stream[i] = v1 // store the last // sample of the run // initialize next run if ( v2 > v1 ) status = INCREASING else if ( v2 < v1 ) status = DECREASING else status = SAME // initialization complete n = 1 } //end if } //end while return encoded_stream }//end Procedure The size of encoded signal (encoded_stream) that we get is much less than the original signal. This algorithm can be implemented while sampling the original signal. 3.2.2 Algorithm Decode Input: Encoded signal. Output: Decoded signal. Procedure Decode ( ) variables: v1, v2 : BYTE encoded_stream, decoded_stream: Array of BYTE p, i, j, n : integer p = 0 j = 1 decoded_stream[p] = encoded_stream[0] v1 = encoded_stream[0] while ( encoded_stream not end) // read number of samples in the run n = encoded_stream[j] j = j + 1 // read end point (last sample) of this run v2 = encoded_stream[j] j = j + 1 for i = 1 to n do // make this run p = p + 1 decoded_stream[p] = v1 + (v2 v1) * i / n } //end for } //end while return decoded_stream }//end Procedure 4 Performance Analysis The method can be implemented during sampling. Thus no extra other time is required for encoding. Again at receiver end it can be decoded as soon as the signal is received. The complexity of our algorithm is only O(n). Both the encoder and the decoder circuits can be implemented by using only a comparator, a counter and some other basic gates in hardware. The system is also parallelizable. Encoding and decoding can run parallelly. Since voice is sampled at a higher frequency, the distortion found in our technique is very low. Compression achieved by our method is higher in case of lower sampling rate. If the sampling rate is higher, less compression is achieved. Again amount of distortion that we get is less in case of higher sampling rate. We have analyzed the performance on several recorded voices. The voices have been recorded at 11025 Hz and 22050 Hz. On average case for the voice signals recorded at 11025 Hz our method can compress the signal by 70.4%, i.e. size of encoded signal = 29.6% of original signal. In case of the voice signals recorded at 11025 Hz, our method achieves 65.2% compression on an average. Figure 4 shows the method in which we have calculated the amount of distortion. We have calculated the rms (root mean square) value of the distortion. Let we have analyzed a signal of n samples. F 1, F 2, F 3,..., F n are the sampled values, i.e. F i ( 1 <= i <= n) series is the original signal. And f i ( 1 <= i <= n) series is the decoded signal. Certainly all f i are not equal to F i. The amount of distortion at ith sample is equal to Fi - fi. The rms value of total distortion = sqrt(average(squar(f i - fi))) for i = 1 to n. Let x be the sampling levels. As we have sampled the signal in byte (i..e 2 8 = 256 level sampling), x = 256 in our analysis. So the percentage of distortion = (sqrt(average(squar(fi - fi))) * 100 )/x. The distortion got by our process for the signal recorded at 11025 Hz is 1.5%. In case of the signal recorded at 22050 Hz the distortion got is 1.1%.

5 Conclusion There are many techniques for encoding voice signal. We have shown a completely different method for doing this. This encoding method cannot keep the original signal intact. Rather the signal is slightly distorted. i.e. this is a lossy compression scheme. When we encode the original signal, the encoded signal that we get is much less than the original signal, i.e. the compression is very high. Again when we decode the encoded signal a very little distortion within acceptance level takes place. Lossy compression can be applied in case of voice transmission depending on the situation. This method will be helpful in voice transmission where the target is to send only the voice message. The very little noise that we get cannot affect the tone of the voice. As the distortion level is very low and the overall performance is good the scheme can be accepted. The encoding and decoding process described in this paper are very straightforward and thus the technique is very easy to implement both in software and hardware level. In future this method can be improved by smoothing the sharp edges and thus making the decoded signal more perfect. Acknowledgements We would like to express our thanks to Manju Reddy for sending us the QAI Technical Report [4] and to Rajsekaran and Venugopal for assisting us during the implementation of several variations of our techniques on some recorded voice message and also to Apurba Krishna Deb for his insightful comments and suggestions. References: [1] DPS (1990), Digital Paper Solutions, Inc, Westmont. [2] F. Gustafsson, Adaptive filtering and change detection. J. Wiley New York, 2000. [3] J. J. K. Ó Ruanaidh and W. J. Fitzgerald, Numerical Bayesian methods applied to signal processing. Springer-Verlag New York, 1996 [4] QAI Technical Report (1992), Quality America Inc. [5] R. Cmejla and P. Sovka, Audio Signal Segmentation using recursive Bayesian change-point detectors, 3rd WSEAS International Conference on Signal processing, Robotics and Automation, Staltzburg, Austria, 2004. [6] Wikipedia Technical Journal (1996). [7] Wikipedia Technical Journal (1998). [8] ZSoft (1988) PCX Technical Reference Manual, ZSoft Corporation.