A look at the MPEG video coding standard for variable bit rate video transmission 1

Similar documents
MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

1C.4.1. Modeling of Motion Classified VBR Video Codecs. Ya-Qin Zhang. Ferit Yegenoglu, Bijan Jabbari III. MOTION CLASSIFIED VIDEO CODEC INFOCOM '92

Relative frequency. I Frames P Frames B Frames No. of cells

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Chapter 10 Basic Video Compression Techniques

Motion Video Compression

Multimedia Communications. Video compression

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Multimedia Communications. Image and Video compression

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Video 1 Video October 16, 2001

An Overview of Video Coding Algorithms

Video coding standards

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Chapter 2 Introduction to

AUDIOVISUAL COMMUNICATION

Pattern Smoothing for Compressed Video Transmission

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Overview: Video Coding Standards

The H.263+ Video Coding Standard: Complexity and Performance

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

The H.26L Video Coding Project

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Implementation of an MPEG Codec on the Tilera TM 64 Processor

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

DCT Q ZZ VLC Q -1 DCT Frame Memory

TERRESTRIAL broadcasting of digital television (DTV)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

MPEG-2. ISO/IEC (or ITU-T H.262)

complex than coding of interlaced data. This is a significant component of the reduced complexity of AVS coding.

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Implementation of MPEG-2 Trick Modes

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

COMP 9519: Tutorial 1

Principles of Video Compression

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter?

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Analysis of Video Transmission over Lossy Channels

Improvement of MPEG-2 Compression by Position-Dependent Encoding

Digital Image Processing

Dual frame motion compensation for a rate switching network

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

MPEG has been established as an international standard

Visual Communication at Limited Colour Display Capability

MPEG-1 and MPEG-2 Digital Video Coding Standards

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Digital Video Telemetry System

Reduced complexity MPEG2 video post-processing for HD display

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Error prevention and concealment for scalable video coding with dual-priority transmission q

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Buffering strategies and Bandwidth renegotiation for MPEG video streams

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

Minimax Disappointment Video Broadcasting

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

UC San Diego UC San Diego Previously Published Works

T he Electronic Magazine of O riginal Peer-Reviewed Survey Articles ABSTRACT

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Constant Bit Rate for Video Streaming Over Packet Switching Networks

PACKET-SWITCHED networks have become ubiquitous

Dual Frame Video Encoding with Feedback

A Cell-Loss Concealment Technique for MPEG-2 Coded Video

INTRA-FRAME WAVELET VIDEO CODING

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Dynamic bandwidth allocation scheme for multiple real-time VBR videos over ATM networks

The Development of a Synthetic Colour Test Image for Subjective and Objective Quality Assessment of Digital Codecs

Express Letters. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

ITU-T Video Coding Standards

Video Processing Applications Image and Video Processing Dr. Anil Kokaram

Chapter 2. Advanced Telecommunications and Signal Processing Program. E. Galarza, Raynard O. Hinds, Eric C. Reed, Lon E. Sun-

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

COMPRESSION OF DICOM IMAGES BASED ON WAVELETS AND SPIHT FOR TELEMEDICINE APPLICATIONS

ENCODING OF PREDICTIVE ERROR FRAMES IN RATE SCALABLE VIDEO CODECS USING WAVELET SHRINKAGE. Eduardo Asbun, Paul Salama, and Edward J.

A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

CHROMA CODING IN DISTRIBUTED VIDEO CODING

Analysis of MPEG-2 Video Streams

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

Understanding Compression Technologies for HD and Megapixel Surveillance

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

Tutorial on the Grand Alliance HDTV System

DATA COMPRESSION USING THE FFT

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Unequal Error Protection Codes for Wavelet Image Transmission over W-CDMA, AWGN and Rayleigh Fading Channels

HEVC: Future Video Encoding Landscape

Analysis of a Two Step MPEG Video System

Chrominance Subsampling in Digital Images

Transcription:

A look at the MPEG video coding standard for variable bit rate video transmission 1 Pramod Pancha Magda El Zarki Department of Electrical Engineering University of Pennsylvania Philadelphia PA 19104, U.S.A. TEL: (215) 898-9780 e-mail: magda@ee.upenn.edu FAX: (215) 573-2068 pancha@ee.upenn.edu Abstract In this paper we take a close look at the MPEG video coding standard for the transmission of variable bit rate video on ATM based Broadband ISDN. The MPEG standard has been defined for use in a variety of applications. Our focus in this paper is in its use for real time transmission of broadcast quality video. We were particularly interested in observing the impact of two key parameters, the intraframe to interframe picture ratio and the quantization index which are defined in the standard, on the bit rates per frame. These parameters may be used to control video sources depending on the state of the network. Also, as opposed to previous work which looks only at bit rates per frame, we study the bits generated per macroblock, the basic MPEG coding unit. By packetizing these bits, we obtain insights into the cell arrival process to a network for a MPEG video source. 1.0 Introduction The transmission of variable bit rate video over asynchronous transfer mode (ATM) based Broadband ISDN (B- ISDN) has been the focus of much research recently. Several issues relating to the interaction of a video source and the network are still open topics[10]. The two major issues of interest are the effect of network parameters such as delay and loss on video service and the effect of high bit rate video sources on network performance.the detailed characterization of a single video source is an important first step in any effort to study these issues. One of the obstacles to such characterization has been the variety of techniques which can be used to encode video sequences. In addition, each type of video source, i.e. video telephone, video conference, broadcast television, usually needs to be characterized independently. The coding technique and the source from which the video sequence is obtained must therefore always be specified for a source characterization study to be meaningful. Many types of video coding algorithms have been proposed for variable bit rate video services. This variety in video coding techniques has made the task of characterizing video sources difficult. Previous studies of bit rate statistics of video sources have utilized coding algorithms, 1. This work was supported in part by grants NCR 89-14447 and NCR 90-16165 from the National Science Foundation such as interframe coding combined with variable length codes[9] or conditional replenishment[6] [8], that are less efficient than current coding techniques. Other techniques such as subband coding[1] have demonstrated promise for only a few types of services. Another recent study, which provides insights into long run statistics for coded video sequences, used block based intraframe discrete cosine transform (DCT) coding[3], but the video coder in the study does not utilize any interframe coding. Statistics obtained in these studies, while important, may not be valid for current coding techniques. A source characterization method which models the properties of a large number of sources is ideally required for the purposes of analysis. One approach for achieving such models for video sources is to identify basic components of video sources and video sequences and from these properties predict bit rates for these sources[7]. While this can lead to a universal source characterization framework, the basic components which need to be identified for modeling the source may not be readily available and may vary among different classes of coding algorithms. The parameters needed for such an approach may therefore not be easy to obtain. The other approach is to model video sources where the source coding is based on a standard algorithm which is flexible enough to be applied to a multitude of video services, which is the approach taken in this paper. We study the output stream of a video coder, which complies with the Motion Pictures Expert Group (MPEG) coding standard, with an NTSC quality video sequence as the input. Unlike previous works, we do not examine just bit rates per frame but also the manner in which these bits may arrive within the time period of a frame. This involves examining the bit generation process at the lowest level possible and then applying a packetization algorithm to generate ATM cells. Since the MPEG video coding algorithm has been proposed for a variety of applications, we also investigate the effect of changing some of the coding parameters on the statistics of interest. The outline of the paper is as follows; section 2 presents an overview of the MPEG coding algorithm including the modifications made for our real time application, section 3 discusses modeling issues for the MPEG video source and section 4 describes some of the statistical results obtained.

2.0 MPEG Coding of NTSC Quality Video 2.1 Overview The MPEG coding algorithm was developed primarily for storage of compressed video on digital storage media. Provisions were therefore made in the algorithm to enable random access, fast forward/reverse searches and other features when decoding from any digital storage media. However, the coding standard is flexible enough to be suitable for a much wider range of video applications. Recent applications of MPEG-like coding algorithms have appeared for video services from high definition television to multimedia workstations. The basic coding loop for the MPEG algorithm is shown in Fig 1. The main processing techniques of the MPEG algorithm are a combination of intra/interframe coding; block-based motion compensation for interframe coding and block-based DCT for intraframe coding. Motion compensation can be either causal (prediction) or non-causal (interpolation). The sequence of coding, in general, consists of first applying motion compensation. The prediction error resulting from this step is coded using DCT. Finally, the motion vectors are variable length coded and transmitted while the DCT coefficients are quantized, variable length coded and then transmitted. An outline of the coding algorithm and structure is presented below with emphasis on the parameters and processes that are of interest in understanding the cell arrival process. For a fuller description of the MPEG coding standard and associated background information see [2]. The decision parameters for the compression algorithm (source coder) were obtained from the MPEG Video Simulation Model Three (SM3) report issued by Simulation Model Editorial group of the ISO-IEC working group. The simulation model algorithm was modified to make it more suitable for real-time transmission of video as described below. The input video sequence for the MPEG compression algorithm was a 3 minute 40 second sequence from the movie Star Wars. This segment was chosen such that it contained scenes that were made up of a mix of frames with and without motion. The sequence was digitized from laser disc which has a resolution close to NTSC broadcast quality. The spatial resolution of the digitized frames was 512 by 480 pixels. It should be noticed that the spatial resolution is significantly higher than frames of 352 by 288 pixels recommended by the SM3 for achieving video tape quality. The average bit rates needed to code this video sequence will therefore be correspondingly higher. 2.2 Coding Layers The MPEG simulation model utilized for generating variable bit rate video is organized into four layers. The layers are arranged as follows, in order of decreasing size: 1. Picture 2. Slice 3. Macroblock 4. Block. A picture (or frame) is the basic unit of display. The dimension in pixels of a frame is variable and is determined by the resolution required for a particular application. Our study used frame sizes of 512 by 480 luminance pixels. The chrominance components were subsampled to give the digitized image the 2:1 luminance to chrominance pixel ratio required for the MPEG coding scheme. A slice, which is a horizontal strip within a frame, is the basic processing unit in the MPEG coding scheme.in this case, each frame contains 30 slices that are 512x16 luminance pixels in size. If an image is digitized one scan line at a time, coding can only begin when all pixels in the current slice are obtained. Further, a slice is the smallest self contained coding unit, in that unlike the lower layers, where differential coding is used, there is no interdependence between coefficient values generated for a slice and its neighbor. This is an important observation for the process of error recovery within a frame. A macroblock is the basic coding unit in both prediction and intraframe mode. All variable length coding is done on a macroblock basis. A macroblock consists of four 8x8 blocks of luminance pixels and two 8x8 chrominance blocks. Each frame therefore comprises of 960 macroblocks. Motion estimation is performed at the macroblock layer level and decisions on when to utilize motion compensation are made on a macroblock by macroblock basis. The smallest unit is the block which is an 8x8 block of pixels. The input image format for an MPEG coder requires the number of chrominance pixels to be half the number of luminance pixels, therefore the area in the image spanned by a chrominance block is twice that of a luminance block. The block is the coding unit for intraframe DCT coded pictures. Modeling for MPEG video sources must be done at the level of one of these layers. We chose to study the bit generation process at the macroblock layer, since macroblocks are the basic coding unit for the video source. To study the coding process at a lower layer would be time consuming and unnecessary, while studying the process at a higher layer may result in the loss of information. 2.3 Coding Modes The MPEG coding scheme allows for several coding modes at the picture and macroblock layer. At the picture layer, three coding modes exist; intraframe, predictive and interpolative. Frames coded in the intraframe mode must also contain macroblocks coded in intraframe mode. For predictive coded frames, however, macroblocks can be coded in either motion compensated or intraframe modes. The choice of coding modes influences the coding efficiency of the MPEG algorithm. However, not all modes may be suitable for all coding applications. For real time sources, interpolative coding may not be suitable as it is non causal and requires out of sequence transmission of frames. This would involve a larger reconstruction delay and larger buffers at the receiver. In this simulation model, picture coding has therefore been restricted to intraframe and predictive interframe modes. The ratio of frames coded using interframe mode to those using intraframe mode is important, since it can be used as a control parameter for network transmission of video. This ratio, N, is usually fixed at the time of setup but can

be varied in the middle of service, should the need arise to adapt to network conditions. In intraframe mode, a frame is processed block by block. Each 8x8 block is first transformed using a two dimensional (2D) DCT. The coefficients from this transformation are then quantized. A quantizer matrix ensures that the low frequency components are quantized with a small step size while the higher frequency components are quantized more coarsely. The DC components which remain fairly constant throughout a frame are coded differentially using the DC value of the previous block in the slice as a predictor. This predictor is reset to 128 at the beginning of every slice. Once the 64 coded coefficients for each block in a macroblock are obtained, variable length codes are generated for the macroblock. These variable length codes are specified in the MPEG standard for certain combinations of runs of zeros followed by a level. If no variable length code exists for a particular combination of run and level, a fixed length code is used. In predictive picture coding mode, frames are processed on a macroblock basis. The first step in this process is macroblock based motion estimation. A square area of 30 pixels around each macroblock is chosen as the motion vector search area. A potential motion vector is identified within the search area which minimizes the absolute macroblock difference between the current macroblock and the displaced (predicted) macroblock. If the absolute macroblock difference is less than some threshold level, the motion vectors are differentially coded and transmitted using variable length codes. Next, the prediction error, after motion compensation is applied to the macroblock, is encoded using DCT. The coefficients obtained from the transform of the prediction error are then quantized coarsely with a flat quantizer matrix and variable length coded as in the case of intraframe mode. If a fixed quantizer matrix is used for each coding mode, it is possible to scale all quantization levels using the quantizer scaling parameter, q. In fixed bit rate coding, this parameter can be increased to ensure that the number of bits generated by the coding algorithm is less than or equal to the target bit rate. However, when q is increased, visual blocking effects become more pronounced when the frame is reconstructed at the receiver. 3.0 Modeling MPEG Video For the purposes of modeling video traffic on B-ISDN networks, analysis of the coding process is important in determining the manner in which packets are created at the video source and arrive to the network. The little that is known about this cell generation process for a variable bit rate video source is found in [5]. In this work, it was shown that the cell arrival process from a video source can have a complicated distribution. Most authors, however, have assumed, for the sake of convenience, arrival processes for video streams which will lead to queueing behavior that can be analyzed easily. However, actual cell arrivals to a network from a video source may differ significantly from these models and therefore need to be studied in detail. Fig. 1 shows the MPEG coding loop described in the previous section. It can be observed that the algorithm has the same basic structure for processing all frames. It is therefore possible to reasonably model the output of the source once the functioning of this loop is understood. In both picture coding modes, each frame is first decomposed into its component macroblocks and each macroblock is then coded as shown in Figure 1. The important steps of the coding process are shown in the boxes of Figure 1. In order to model network traffic, we are interested in the way in which quantized coefficients and motion vectors are generated and passed to the variable length code generator. The variable length codes created represent the bit stream of the video source. The bit stream generation process for a macroblock is composed of 4 major functions; DCT/ inverse DCT, quantization, motion estimation and variable length coding. The process of quantization and variable length coding can be achieved at little computational expense. The two other processes, an 8x8 2D DCT and motion estimation of a macroblock, are computationally intensive. If we assume that video information is always present at the input to the coder, the time interval between coding successive macroblocks will be limited by the speed at which these functions can be performed in hardware. This will in turn limit the rate at which ATM cells are created at the video coder output. If we let x 1, denote the time for an 8x8 DCT, and x 2, the time taken for motion estimation to be performed on a macroblock, then the time taken to code a macroblock, X, can be expressed as a function of x 1 and x 2. Statistics for interarrival times can therefore be expressed in units of X seconds. Further, since each frame undergoes the same amount of processing, it is reasonable to assume that the time taken to generate bits for one frame can be denoted by a constant, T g. Both these constants will be determined by the hardware implementation of the video coder. The MPEG algorithm uses several parameters which when varied may lead to a significant change in the characteristics of the video source. Two of these parameters, N, the interframe to intraframe ratio and q, the quantizer scale, are especially interesting as they can potentially be used for controlling a video source through some network feedback mechanism. For example, if the network cell error rate temporarily increases, then by decreasing N we could increase the frequency of intraframe coded pictures. This increase in the number of intraframe coded pictures will ensure that any errors in reconstruction of a frame will not propagate for more than N frames. Similarly, if the network detects the onset of congestion, a feedback message could be sent to all video sources which would require them to decrease their bit rates. The easiest way to accomplish this is by increasing q, at the expense of a temporary loss in image quality. It is therefore not only important to characterize the video source under normal operation, but also to quantify the changes that occur when these parameters are modified. A video source can be characterized reasonably well using three statistical properties; total cell arrivals per frame, burst interarrival times and burst length distribution. While previous statistical studies have been restricted to total cell arrivals per frame, the information on how cells arrive cannot be discerned from these studies. The cell arrival process can be modeled well given the burst interarrival and burst length statistics for a frame. Further, some of the ambiguity that exists when modeling video sources can be removed by specifying these statistics. The

total cell arrivals per frame is obtained by first determining the number of bits generated per frame and then converting this number to the equivalent number of 48 byte ATM cells. The burst interarrival times are calculated from the bits generated per macroblock within a frame. As stated above, this interarrival time is normalized to units of X seconds. We assume a simple packetization procedure for the source. Cells are formed when the sum of bits generated by successive macroblocks is greater than or equal to the size of an ATM cell. It is also assumed that bits generated for a frame will not be combined with the bits generated from the next frame. This implies that a minimum of two cells are always sent for each frame; a beginning of frame cell containing header information and an end of frame cell, which may be half filled, signifying the end of the current frame. The third statistic of interest is burst lengths. While it has been generally argued that video traffic is bursty, the definition of a burst is not always clear. When considering bit rates on a per frame basis, a burst event can be said to occur when the bit rate for a frame is significantly larger than the average bit rate. However, a burst can also be defined at several other levels. In this study, a burst is said to be created at the output of the coder if the number of bits generated for a macroblock requires 2 ATM cells or more when packetized. 4.0 Results The coding algorithm was initially run several times to obtain an indication of the normal operating parameters for q. This was done by checking the reconstructed frames to see if the visual quality of these frames was acceptable. With q set to 8, as recommended in the SM3 specifications, the image quality was judged to be good for all viewed frames. When q is increased beyond this value, some blocking effects are visible in the reconstructed frame. In this study, we investigated the effect of varying q (3 values; 4, 8 and 16), on the traffic characteristics of the coded video bit stream. The effect of varying the type of coder from a pure intraframe coder (N=1) to a mixed intrainterframe coder (N=16) to a pure interframe coder (N=2592) was also investigated. The traffic characteristics of the video source are presented in Tables 1 and 2 and Figures 4 and 5. Table 1 shows the mean and variance of the number of cell arrivals per frame for all the (N, q) combinations studied. The mean bit rate for this 3 minute 40 second sequence with N=16 and q=8 is 1.2 Mb/second. This value is within the range we would expect for this coder, since the frame resolution we utilize is higher than the format recommended in the MPEG specifications. An interesting observation is that the mean and variance for the cell arrival statistics for the (N=16, q=4) and (N=2592, q=4) coder are very similar. This implies that there maybe no advantage in using a value of N greater than 16 in a network environment. The burst length distributions for the coder for different (N, q) combinations is shown in Table 2. It can be seen that in intraframe coding (N=1) bursts of greater than one cell per macroblock are rarely produced while for interframe coding (N=16) burst lengths can reach up to seven cells per macroblock. Figures 2 and 3 illustrate a sample bits per frame and bits per macroblock time series respectively for the coder under normal operation (N=16, q=8). The periodic impulses in the bits per frame time series (Figure 2) which occur every 16 frames are due to the intraframe coded frames. Figure 3 illustrates the same process at a finer layer. In this sequence of 5 frames (4800 macroblocks), the last 4 frames are coded in the predictive picture mode, while the first is intraframe coded. In general, the average number of bits generated per macroblock for intraframe coding is higher than in the predictive case while the variance is lower. The peaks in the number of bits generated per macroblock in Figure 3 are not noticeable when we consider only total bits per frame as in Figure 2. This information can be quite important when considering buffer sizing at network nodes. We also notice that the peaks generated in the interframe picture modes are larger than the peaks in intraframe mode. One reason for this effect is the fact that when a macroblock is coded in the intraframe mode, within a predictively (interframe) coded picture, the run length coding scheme is not as efficient. The distribution of the ATM cells generated per frame for a coder for different (N, q) combinations is shown in Figure 4. To generate a smoother envelope for the probability mass function (pmf), samples were gathered into bins of 10 ATM cells. It is interesting to note that the distribution for a (N=1, q=4) coder, which corresponds to a coder utilizing only intraframe DCT coding, is quite different from a (N=16, q=4) coder, in that both the average number of cells per frame is higher and the spread of the distribution is greater. It can also be noticed that increasing q causes the shape of the pmf to become more one sided. This observation is somewhat similar to the observations in [9], where it was shown that a lower resolution source, like video conferencing, had a skewed probability density function of bit rates when compared to a broadcast television source. Figure 5 shows the interarrival distribution for bursts within a frame. The primary trend that is apparent from the data is that as q decreases, the envelope of the interarrival time distribution tends towards an exponential, notably (q=4). In general, they appear to be more complex distributions which may need to be modeled as a combination of distributions as observed in [5]. 5.0 Statistical Analysis The statistical properties of video coder bit streams are not well understood. Knowledge of these properties is essential for determining the best strategies for allocating resources for these types of services. From the network point of view, two statistical quantities are extremely important in determining performance; the distribution of the number of cells generated per frame and correlations in the cell generation process. We investigate these properties for various N and q parameters in MPEG coded video. The autocorrelation function for cells generated per frame for 3 different values of N and q is shown in Figure 6. The horizontal line in the figure indicates the values below which the autocorrelation function is almost zero. For the cases where both interframe and intraframe coding were utilized (N=16), impulses in the autocorrelation function occur every N frames confirming that significant correlation exists between the intraframe coded components of the cell generation process. For pure intraframe (N=1) or interframe (N=2592) coding, correlation between cells generated per frame exists for up to 300 frames. However

the magnitude of the correlation is not large for frame lags greater than around 50 frames. In [4], it was noted that a gamma distribution fits the empirical distribution of the number of cells generated per frame quite well for a variety of video conferencing sources. It was also hypothesized that the gamma distribution might be suitable for a variety of sources and coding algorithms. We tested this hypothesis using a quantilequantile (Q-Q) plot of the observed values against the expected values for a gamma distribution. The Q-Q plots for various q values are shown in Figure 7. It is interesting to observe that the gamma distribution fits the empirical distribution of the number of cells generated per frame extremely well for low bit rate video sources (q=16). However, for higher quality sources (q= 4,8), the Q-Q plot deviates significantly from the expected values for a gamma distribution. It is therefore unlikely that a gamma distribution can be utilized as a fit for all types of video sources. In fact, the distributions of ATM cells per frame for higher quality video coders do not appear to follow any smooth distribution, making it difficult to fit a distribution to this cell generation process (see Figure 4). The observation described above implies that if network resource allocation decisions are to be made using detailed source characterization, it may be necessary to examine statistics for sources at a smaller scale in order to obtain a better model. One approach could be to characterize the cell generation process at the slice layer. 6.0 Conclusions The study of the statistical properties of packet video streams, to model video sources, is a required step in the process of designing B-ISDN networks to handle heterogenous traffic. This work is especially critical both because of the high bandwidth required for each video connection and also because of the delay sensitive nature of real-time video. This effort to model a video source is made more difficult because of the sensitivity of statistics to changes in coding algorithms. Previous works on collecting statistics for modeling video sources have used a wide variety of coding algorithms. A serious drawback of this is that, over time, many of these coding algorithms have either become outdated or unpopular. With work on a video coding standard, the MPEG algorithm, nearing completion, statistical studies are required to determine the effect of this standard on modeling video sources. This work is especially important for network modeling because of the many proposed uses of MPEG-like algorithms for video services from HDTV to multimedia communications. By studying the underlying coding process, we have determined a simple but plausible packetization process for a real world video coder. The statistics obtained for the ATM cell distribution per frame and the interarrival times for bursts based on this packetizing scheme can lead to modeling at finer time scales. We have also introduced the parameters x 1 and x 2, which are important for analyzing the effect of video sources on the network. If, for example, x 1 and x 2 are very short when compared to the time period of a frame, then the source can be considered to be active for only a short period at the beginning of every frame, and from the point of view of the network all cells arrive almost simultaneously as a large burst. In addition, we have obtained statistics for MPEG sources at different operating points (N, q) which will be useful for modeling the traffic effects of various network feedback mechanisms. Future work will extend these results to determine analytical models for MPEG video sources as well as investigate the ways in which the parameters (N, q) can be utilized in a network environment. Subjective visual testing will be employed to determine the effects of these parameters on a real video service. Acknowledgments We wish to thank Mark Garrett and Dan Wilson at Bellcore for their time and effort spent in digitizing the Star Wars video sequence and for loaning equipment needed for this work. We also wish to thank Rashid Ansari and Fure-Ching Jeng at Bellcore for providing documentation on the MPEG coding standard. References [1] P. Douglas, G. Karlsson, and M. Vetterli. Statistical analysis of the output rate of a sub-band video coder. In SPIE Visual Communications and Image Processing 88, pages 1011 1024, 1988. [2] D. Le Gall. MPEG: A video compression standard for multimedia applications. Communications of the ACM, 34(4):305 313, April 1991. [3] M. Garrett and M. Vetterli. Congestion control strategies for packet video. In Fourth International Workshop on Packet Video, August 1991. [4] D. Heyman, A. Tabatabai, and T.V Lakshman. Statistical analysis and simulation study of video teleconference traffic in ATM networks. submitted for publication, 1991. [5] R. Kishimoto, Y. Ogata, and F. Inumaru. Generation interval distribution characteristics of packetized variable bit rate video coding data streams in an ATM network. IEEE J. on Sel. Areas in Comm., 7(5):833 841, June 1989. [6] B. Maglaris, D. Anastassiou, P. Sen, G. Karlsson, and J.D. Robbins. Performance models of statistical multiplexing in packet video communications. IEEE Trans. on Comm., 36(7):834 844, July 1988. [7] R. M. Rodriguez-Dagnino, M. R. K. Khansari, and A. Leon-Garcia. Prediction of bit rate sequences of encoded video signals. IEEE J. on Sel. Areas in Comm., 9(3):305 313, April 1991. [8] P. Sen, B. Maglaris, N.-E. Rikli, and D. Anastassiou. Models for packet switching of variable-bit-rate video sources. IEEE J. on Sel. Areas in Comm., 7(5):865 869, June 1989. [9] W. Verbiest, L. Pinnoo, and B. Voeten. The impact of the ATM concept on Video Coding. IEEE J. on Sel. Areas in Comm., 6(9):1623 1632, December 1988. [10] Y-Q Zhang, W. W. Wu,, K.S. Kim, R.L. Pickholtz, and J. Ramasastry. Variable bit rate video transmission in the broadband ISDN environment. Proceedings of the IEEE, 79(2):214 221, February 1991.

TABLE 1. [Mean and variance of cell arrivals N q Cell Arrivals per frame Mean Variance 16 8 105.68 2951.31 16 4 204.49 9287.37 16 16 62.80 1127.41 1 8 189.98 3924.52 1 4 297.73 11752.73 2592 4 203.62 8875.34 TABLE 2. Distribution of burst lengths N q Burst Length Distribution ( P{burst length = i cells} ) i=1 i=2 i=3 i=4 i=5 i=6 16 8 0.9791 192 014 002 000 000 16 4 0.9642 303 044 008 002 001 16 16 0.9948 050 001 000 000 000 1 8 0.9998 002 000 000 000 000 1 4 0.9859 141 000 000 000 000 2592 4 0.9570 338 053 009 027 002 FIGURE 1. Basic MPEG coding loop Intraframe / Interframe Video In + - DCT Q Q -1 VLC Bit Stream DCT -1 Motion Vector ME FM DCT : Discrete Cosine Transform FM : Frame Memory Q : Quantizer ME : Motion Estimation VLC : Variable Length Code Generator

FIGURE 2. Bits generated per frame for entire sequence (with enlarged 30 second sequence) 1.5e+05 1.0e+05 Bits 5.0e+04 e+00 0 1000 2000 3000 4000 5000 Frames 1.5e+05 Bits 1.0e+05 5.0e+04 e+00 0 120 240 360 480 600 720 Frames FIGURE 3. Bits generated per macroblock for 5 frames of sequence 400 300 Bits 200 100 0 0 960 1920 2880 3840 4800 Macroblock

FIGURE 4. Distribution of ATM cells per frame 0.15 N=16 q=8 0.10 5 0 Cell Arrivals per frame 0.15 0.15 N=16 q=4 N=16 q=16 0.10 5 0.10 5 0 Cell Arrivals per frame 0 Cell Arrivals per frame 0.15 0.15 N=1 q=4 N=2592 q=4 0.10 5 0.10 5 0 Cell Arrivals per frame 0 Cell Arrivals per frame

FIGURE 5. Burst interarrival time probability mass function 0.3 0.1 N=16 q=8 mean: 8.9 0.3 0.1 0 10 20 30 Interarrival Time (X seconds) 0.3 N=16 q=4 N=16 q=16 mean: 4.6 mean: 14.9 0.1 0 10 20 30 Interarrival Time (X seconds) 0 10 20 30 Interarrival Time (X seconds) 0.3 0.1 0.3 N=1 q=4 N=2592 q=4 mean: 3.0 mean: 4.6 0.1 0 10 20 30 Interarrival Time (X seconds) 0 10 20 30 Interarrival Time (X seconds)

FIGURE 6. Autocorrelation Function for cells generated per frame Autocorrelation 1.0 N=1 q=4 0.8 0.6 0.4 0 100 200 300 Frames Autocorrelation 1.0 N=16 q=4 0.8 0.6 0.4 0 100 200 300 Frames Autocorrelation 1.0 N= 2592 q=4 0.8 0.6 0.4 0 100 200 300 Frames FIGURE 7. Quantile-Quantile plot of gamma fit for cells generated per frame 600 Observed values 400 200 0 N=16 q=4 N=16 q=8 N=16 q=16 Expected values