VVD: VCR operations for Video on Demand

Similar documents
Implementation of MPEG-2 Trick Modes

Motion Video Compression

Chapter 10 Basic Video Compression Techniques

Pattern Smoothing for Compressed Video Transmission

Stream Conversion to Support Interactive Playout of. Videos in a Client Station. Ming-Syan Chen and Dilip D. Kandlur. IBM Research Division

An Interactive Broadcasting Protocol for Video-on-Demand

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Network. Decoder. Display

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

Analysis of MPEG-2 Video Streams

Understanding Compression Technologies for HD and Megapixel Surveillance

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Combining Pay-Per-View and Video-on-Demand Services

Content storage architectures

Chapter 2 Introduction to

An Overview of Video Coding Algorithms

Multimedia Communications. Video compression

An Efficient Implementation of Interactive Video-on-Demand

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

MPEG has been established as an international standard

16.5 Media-on-Demand (MOD)

Seamless Workload Adaptive Broadcast

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Evaluation of SGI Vizserver

17 October About H.265/HEVC. Things you should know about the new encoding.

Principles of Video Compression

ATI Theater 650 Pro: Bringing TV to the PC. Perfecting Analog and Digital TV Worldwide

White Paper. Video-over-IP: Network Performance Analysis

Real-Time Parallel MPEG-2 Decoding in Software

OPEN STANDARD GIGABIT ETHERNET LOW LATENCY VIDEO DISTRIBUTION ARCHITECTURE

A Video Frame Dropping Mechanism based on Audio Perception

Digital Media. Daniel Fuller ITEC 2110

AUDIOVISUAL COMMUNICATION

Chapter 2. Advanced Telecommunications and Signal Processing Program. E. Galarza, Raynard O. Hinds, Eric C. Reed, Lon E. Sun-

MULTIMEDIA TECHNOLOGIES

Digital Video Telemetry System

Relative frequency. I Frames P Frames B Frames No. of cells

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

AN MPEG-4 BASED HIGH DEFINITION VTR

Multimedia Communications. Image and Video compression

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Digital Television Fundamentals

Storage and Retrieval Methods to Support Fully Interactive. Playout in a Disk-Array-Based Video Server

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

A variable bandwidth broadcasting protocol for video-on-demand

Video coding standards

COSC3213W04 Exercise Set 2 - Solutions

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Analysis of Retrieval of Multimedia Data Stored on Magnetic Tape

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Synchronization-Sensitive Frame Estimation: Video Quality Enhancement

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

A Real-Time MPEG Software Decoder

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Interlace and De-interlace Application on Video

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

HEVC: Future Video Encoding Landscape

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

MPEG-2. ISO/IEC (or ITU-T H.262)

Video-on-Demand. Nick Caggiano Walter Phillips

Introduction to image compression

A look at the MPEG video coding standard for variable bit rate video transmission 1

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Frame Processing Time Deviations in Video Processors

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

8 Concluding Remarks. random disk head seeks, it requires only small. buered in RAM. helped us understand details about MPEG.

Abstract WHAT IS NETWORK PVR? PVR technology, also known as Digital Video Recorder (DVR) technology, is a

The H.263+ Video Coding Standard: Complexity and Performance

Video 1 Video October 16, 2001

Multicore Design Considerations

REGIONAL NETWORKS FOR BROADBAND CABLE TELEVISION OPERATIONS

A low-power portable H.264/AVC decoder using elastic pipeline

Overview: Video Coding Standards

MPEG Video Streaming with VCR Functionality

Color Image Compression Using Colorization Based On Coding Technique

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

H.264/AVC Baseline Profile Decoder Complexity Analysis

Lossless Compression Algorithms for Direct- Write Lithography Systems

Film Grain Technology

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Bridging the Gap Between CBR and VBR for H264 Standard

Improving Bandwidth Efficiency on Video-on-Demand Servers y

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

How Does H.264 Work? SALIENT SYSTEMS WHITE PAPER. Understanding video compression with a focus on H.264

Information Transmission Chapter 3, image and video

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

A Dynamic Heuristic Broadcasting Protocol for Video-on-Demand

Transcription:

VVD: VCR operations for Video on Demand Ravi T. Rao, Charles B. Owen* Michigan State University, 3 1 1 5 Engineering Building, East Lansing, MI 48823 ABSTRACT Current Video on Demand (VoD) systems do not provide for interactive VCR-like scan operations such as rewind or fastforward. Pre-storing the entire movie on a set-top box for implementation of such operations is currently neither practical nor feasible. Other approaches to VCR-like controls are either computationally complex on the server or consume large amounts of bandwidth during the operation. This paper proposes a novel technique, VVD, for real-time implementation of VCR-like operations, supporting multiple scan speeds, for Video on Demand with low latency and overhead. VVD uses multi-level compression of movie files to encode the same movie at a hierarchy of playout rates. Depending on the rate at which a scan operation is requested, the corresponding compressed movie file is transmitted. Frame dropping and repetition are used to provide smooth speed transitions among the multiple levels. The overhead of this scheme can be controlled to be less than 100% of the movie storage size. Keywords: VVD, Video on demand, VCR operations, fast-forward, rewind, multi-level compression, MPEG, scan files 1. INTRODUCTION Video on Demand (VoD) services allow users to decide the programs they want to watch and play the movies ondemand"2'3'4. These systems have received a great deal of attention in recent years, particularly in Intranet and educational applications. In these applications, training materials may need to be paused or elements of a lesson repeated, so scan controls are very useful to users. However, existing systems provide only limited facilities for scan operations such as fast forward and rewind. This paper proposes VCR Operations for Video On Demand (VVD), an approach to providing VCRlike scan operations in an MPEG-based VoD system. The system utilizes hierarchical representations of the movie, video playout rate modification, and simple annotation data to provide for interactive rewind and fast-forward at varying speeds. From a user's perspective, VCR functionality of fast-forward and rewind are highly desirable features. Providing these features has typically required download and buffering of an entire video or server synthesis of scan streams. Video presentations have very large data file sizes, so download is both time consuming and requires extensive buffer space. Server synthesis of scan streams from video data requires complex analysis and re-creation of underlying frame sequences as well as recompression for transmission and besides, it is computationally demanding. Video on Demand has two related issues. The first is support for VCR-like scan operations (such as rewind and fast-forward). The second is the ability to broadcast many different movies to different users at regular intervals within the constraints of limited bandwidth. Shenoy and Vin' propose scan operations through the use of a base stream and an enhancement stream as a means of providing VCR functionality for video servers. The base stream carries basic information and is combined with the enhanced stream before being transmitted during normal playout. When the user requests a scan operation, only the base stream is transmitted since quality of the movie may not be very important during such operations. A problem with this approach is that it allows only a single rate of rewind or fast forward. Besides, regular transmission now involves processing of two different streams, which might make real-time applicability of this solution non-trivial. Chen et a!.2 suggest the use of a disk array for VCR operations. For this purpose, a movie is divided into segments that correspond to a GOP, and each successive segment is stored on the next available disk in the disk array. In this way, segments to be transmitted can be retrieved in parallel thus increasing throughput. Accessing the correct disk segments can simulate rewind and fast forward operations. As an example, if a viewer requests a fast-forward at twice the regular playout speed, the server accesses the first, third, fifth disks and transmits the segments from them. The level of granularity for this solution is a segment and the display to the user could be choppy even if the scan speeds are low. * Correspondence: Email: cbowen@cse.msu.edu; WWW: http://metlab.cse.msu.edu; Telephone: 517-353-6488 Part of the SPIE Conference on Multimedia Systems and Applications II 192 Boston, Massachusetts September 1999 SPIE Vol. 3845. 0277-786X/99/$1 0.00

Another problem typically faced by cable operators is that they may have multiple viewers watching the same movie at different points in time. This problem is addressed in Basu et al.3 where the authors suggest altering the playout rates of a movie so as to allow a stream to catch up to another in time. Viewers are grouped into multicast clusters and those behind are merged with the next cluster by dropping frames from the movie. As a result, a channel can be freed and bandwidth is conserved. The authors also suggest merging through content insertion (such as commercials) for the viewers whose streams are ahead of larger groups watching the same movie. Bandwidth considerations for scan operations are addressed in Dey- Sircar et al.4 where the authors associate a Quality of Service (QoS) guarantee with scan operations. A rewind or a fastforward operation is allowed only if there is sufficient bandwidth for transmission of the stream at the required higher rate. Otherwise, the request is rejected. Bandwidth constraints clearly complicate a real-time solution for scan operation support by video servers. However, even in the presence of infinite bandwidth, it would still be difficult to support both rewind and fast-forward. The rewind operation, in particular, is quite tricky to support in real time in compression formats such as MPEG. MPEG, which is discussed in more detail in the next section, utilizes inter-frame compression techniques that require reordering of frames and induces a large amount of dependency among the transmitted frames. A rewind operation requires a massive reordering of frames each time it is requested. It may be possible to download large segments of a movie into a set-top box and process user requests locally. However, it will still take some time for the correct frames to be referenced before they can be displayed. Scan operations at increased playout rates require very high bandwidths, both on the communications channel and to the set-top buffer device. Rewind is further complicated in compression environments such as MPEG by the recursive dependencies of frames. Determining the previous frame when playing in reverse may require recreation of a large number of frames that the frame is dependent upon, leading to choppy playback that is clearly noticeable to the user. Real time support for scan operations, therefore, is best facilitated if pre-processing at the server (such as merging two different streams or reordering of frames in a GOP) is minimized. Similarly, it is desirable that the client-side set-top box not be responsible either for reordering frames or for displaying frames at varying rates. Finally, it is critical that scan operations not require significantly higher bandwidth than ordinary play operations. VVD enables support for varying rate scanoperations at the video server with minimal processing at both the client (viewer) and the server by storing a hierarchy of playout rates and frame location annotation information. Access to frames is fast and requires little server processing resources. The client system need only present the frames as received, though it may require modification to accommodate the varying bit-rates. The transmission bandwidth requirement is little changed from that of normal transmission. This paper focuses on forward and reverse playback of content at normal or faster speeds. Slow motion is easily implemented at the client by simply repeating displayed frames. The server need only reduce the transmission rate of the video appropriately. Reverse slow motion is symmetrical in that VVD supports normal rate reverse playback, which can be easily slowed by the client. 2. MPEG COMPRESSION FORMAT The Motion Picture Experts Group (MPEG) has developed several standards for transmission of compressed digital video and audio5'6'7. The MPEG-i and MPEG-2 family of standards describe the compression of audio and video information for streaming applications, i.e. it can be played as it appears on the hard drive or is received on a communications channel. MPEG also supports a number of different layers and levels, depending on such factors as how many channels are transmitted (including audio) and the transmission speed of the medium the information is travelling across. This discussion will focus on the MPEG-i standard, though the concepts are equally applicable to MPEG-2. 2.1. MPEG video An MPEG video stream utilizes intra-frame and inter-frame coding techniques to compress a digital video stream. Intraframe coding (or intra-coding) techniques compress a single frame of video without regard to any surrounding frames. Nondependent frames are necessary to provide starting points for decompression of sequences and as locations where cumulative errors in reconstruction can be cancelled. Inter-frame coding (or inter-coding) exploits the large amount of redundancy between video frames to increase the compression ratio. Previous or future frames provide a template that can be used to reconstruct the inter-coded frame without transmission of as much data. Typically, MPEG video streams consist of a mix of intra- and inter-coding to provide high quality video with a vastly reduced bandwidth. MPEG stores image information in three different types of frames: I, P. and B. I (information) frames are intra-coded, i.e. the various compression techniques are applied relative to information in the current frame and not relative to any other frame in 193

the video sequence. Coding of such frames is based on the discrete cosine transform of successive 8x8 blocks. The compression algorithm exploits the visual response of the human eye to different characteristics of images. MPEG also exploits temporal redundancy of data between successive frames through the use of inter-coded frames: P and B. P (prediction) frames are predicted in a forward manner from an I frame or the frame constructed by a previous P frame. B frames are the bi-directional informational prediction frames, which are interpolated between lip frames. B frames lead to increased coding efficiency, however the use of interpolation requires that the reference frame that was used for backward prediction be transmitted first, out of order. As seen in Figure 1 below, the transmission and storage order is different than the order in which the frames are displayed. Frame 9 is a B frame and, as such, is dependent upon frame 10, the following I frame. Hence, frame 10 must be transmitted first. Likewise, frames 11 and 12 are dependent upon the picture constructed by the combination of frame 10, an I frame and frame 13, a P frame, so frame 13 must precede frames 11 and 12. Transmission Order 10 B I BBPBBPBB I Frame Type 9 10 11 12 13 14 15 16 17 18 19 Display Order 9 13 11 12 16 14 15 19 17 18 Figure 1: Transmission order The result of having a transmission order that is different from the order in which the frames are displayed is that the implementation of scan operations, in particular the rewind operation, becomes tricky. For the encoding pattern and display order shown in Figure 1, the order in which frames need to be transmitted for a rewind operation is shown in Figure 2. B I B B P B B P B B I Frame Type 9 10 11 12 13 14 15 1617 18 19 Display Order 10 9 13 11 12 16 14 15 19 17 18 Transmission order 19 10 13 16 18 17 15 14 12 11 Rewind order Figure 2: Transmission order for the rewind operation As can be seen, recursive reordering of the frames is required to ensure that the I frames that form a GOP are transmitted first followed by the P frames so that these can be used for decoding the B frames. This massive reordering could take up significant amounts of time, which may not be tolerable to the viewer that requested the scan operation. More significantly, the reconstruction of frame 1 8 requires the reconstruction of frames 10, 1 3, and 1 6. Longer IPB sequences will have an even more significant overhead. 194

An MPEG video stream is structured as a hierarchy of layers that provide for error handling, random search and editing, and synchronization. The top layer is the video sequence layer, which represents a video stream as a sequence of segments. A segment is a set of sequential frames called a group ofpictures (GOP). The GOP layer consists of one or more groups of intra-coded (I) frames and inter-coded (PIB) frames. The next layer is the picture layer itself, which contains the video frames. This layer is subdivided into slice layers, with each slice consisting of macroblocks, which are 16x16 arrays of luminance pixels, or picture data elements, with 2 8x8 arrays of associated chrominance pixels. Macroblocks can be further divided into distinct 8x8 blocks, for further processing such as transform coding. An interesting characteristic of the data format is that elements of each layer are proceeded by a start code that is byte aligned, simplifying random access to elements of each layer. 2.2. MPEG audio and systems streams In MPEG, the video stream is independent of the audio stream. The compression algorithm used for audio is based on characteristics of the human hearing model. The main advantage of having the audio stream separate from the video stream is that it can be easily filtered out if required. The MPEG Systems standard specifies how to multiplex MPEG Video and Audio bit streams into a single stream. An MPEG Systems stream contains the proper timing information so that MPEG players (or decoders) can play back the Video and the Audio properly synchronized. This paper assumes that the audio is discarded during scan operations. No attempt is made to play the audio in reverse or at a higher rate of speed. Discarding the audio during scan operations results releases at least 10% of the available bandwidth and requires that less information be stored in support of scan operations. 3. VVD OVERVIEW Current Video on Demand systems have only limited support for rewind and fast-forward. In addition, the techniques suggested are not very effective for the rewind operation. VVD defines an architecture that can seamlessly support both rewind and fast-forward at varying rates. The architecture of the system is divided into three parts as shown in Figure 3. 3.1. VVD client The client (viewer) is the set-top box or personal computer at the viewer side. The client is responsible for requesting a movie it wishes to watch. While watching the movie, the client can request scan operations, i.e. rewind and fast-forward. The client is assumed to have some interface similar to a 'scan dial" which can rotate in either direction. Rotating to the left requests a rewind and the right a fast-forward. Rotating the dial farther in the direction of the scan request increases the speed of the scan request. The client also includes an MPEG decoder that decodes the incoming MPEG stream and displays it to the viewer. 3.2. Multi-level compressed files The VVD video server maintains two hierarchies of movie files that enable the server to quickly service scan requests. One hierarchy contains copies of the movie in the forward direction, the other contains copies in the reverse direction. Each file is compressed at a playout rate that is twice that of the previous compressed file. Only the 1X playout rate in the forward direction includes audio. The disk space required for each higher-playout file reduces exponentially. In addition, the scansupport movie files need not be able to produce the equivalent quality as the full-rate playout. Each file represents the entire movie itself with fewer frames and lower resolution due to higher compression. In Figure 2, HF- 1 is compressed at 2X, FF-2 at 4X and FF-3 at 8X. Similarly, the video server maintains a version of the movie files (RW- 1, RW-2, RW-3) that are accessed when the client requests a rewind. These files, therefore, correspond to movies that are at given scan speeds either in fast-forward or rewind. Transmitting frames from these files is equivalent to showing a movie that is either in fast-forward or rewind at the specified rate (determined from the compression ratio). The server movie file hierarchy directly solves the scan problem for forward and reverse access at the hierarchy playout rates. Additional file annotation information, described in the following section, allows for smooth transitions between the playout speeds though the use of frame dropping. Maintaining several versions of the movie enables easy support for scan operations. However, it does require additional disk storage overhead. If we assume that each successive version of the movie is half the size of the previous one (since it is half as long), then VVD requires less than 100% storage overhead for maintaining all the files relevant for the fast-forward version of the movie. An additional 200% (extra 100% for the original movie in rewind) storage overhead is required for maintaining all files relevant for the rewind version of the movie. Therefore, with 300% extra disk space, VVD can achieve high quality displays for the viewer on any scan request. However, in practice it is possible to significantly reduce this usage by using several simple techniques. The scan-support streams do not need the audio component. This is a significant 195

decrease in the storage requirement (about 10-20%). Typically the quality of the movie during a scan request is not as significant as during normal-rate playout. Hence, high compression factors can be selected for scan operations, so the files will be significantly smaller. Finally, elements of the stream can be shared, particularly I-frames. The reverse playback stream can reference the I-frames in the forward stream and the faster-playout streams can reference a subset of the I-frames from the lower-speed streams. The combination of these techniques can drop the overhead for scan support to less than 100%. Details on achieving this level of overhead are discussed in Section 5. This overhead may seem significant, but a doubling in disk storage capacity per unit cost occurs about every 1.5 years. In addition, the scan-support streams will generally have significantly smaller access frequency than the underlying movie and are unlikely to require additional caching and replication for high-frequency playback requests. Finally, this is an idea method to use on DVD, where the considerable amounts of space are already available for value-added functions. 3.3. VVD video server The Video server is responsible for accepting incoming movie requests, processing client scan requests, and transmitting the MPEG stream to the client. It consists of three components: the MPEG Transmitter, Rewind Processor, and Fast-forward Processor. On receiving a movie request, the server copies a movie support file describing the available components into memory, initializes some tables (discussed later), and invokes the MPEG Transmitter to start transmitting the MPEG stream to the client. The Rewind Processor is responsible for processing rewind requests from the client during the movie. On receipt of a rewind request, the Rewind Processor processes the request to determine the rate at which the rewind is requested. The server now determines if it has an MPEG file for the movie that is at the desired rate/compression level. If it has this file, it directs the MPEG transmitter to offset to the correct location in the file and start transmitting from that point on, until it receives another request. If there is no MPEG file for the scan rate in the request, the server chooses the closest MPEG file slower than the requested rate and drops frames in order to achieve a playout rate approximating the effective rate requested by the client. The Fast-forward Processor is responsible for processing fast-forward requests from the client in a manner similar to the Rewind Processor. For both scan operations, the video server filters out the audio stream since audio is not an important component during a scan. CLIENT SERVER FF1 FF2 FF3 M P E G C L I E N T D R E E C Q 0 U D E E S R T S Fast Forward Request Show Movie MPEG Movie Stream Rewind Reauest p I '1 Fast-Forward Processor MPEG Movie Transmitter Rewind Processor Original Movie RW 1 RW2 RW3 Figure 3: VVD System Architecture 196

4. FILE ANNOTATIONS FOR VVD A common problem in systems that can provide VCR operations for Video on demand is that they try to reorder the frames when the user does a rewind or a fast-forward. The rewind operation, whether it is done at the server or locally at the client can lead to choppy display for the viewer. VVD avoids this problem by maintaining the multi-level file hierarchy. The video server maintains the same movie at different compression levels for both rewind and fast-forward. The Berkeley encoder9 was used to create files in the desired format from YUV files of a movie. Client requests for, say, a rewind at a twice the original movie rate will be equivalent to transmitting frames from the rewind version of the movie whose compression level is twice that of the original movie. Since the server does not have to reorder frames to service a scan request, but, instead just offsets to the correct location in the file, VVD is suitable for use in large-scale real time video servers. On receiving a scan request, the server needs to start transmitting frames from the correct location in the corresponding scan support file. For this purpose, the server maintains tables that map location of frames in the various rewind and fast-forward scan support files. This table is created on initialization and startup of the video server and is stored in memory. On receiving a scan request, the server checks the table for the location of the closest I frame in the corresponding file. The entry in the table is a pointer to a location in the file from where the video server should start transmitting. Once the server has the position in the file, it simply starts transmitting from there on. The only operations that need to be done during a scan is a table lookup and a seek operation to the correct position on the disk. These operations are not time consuming. A further reduction in disk space can be achieved if instead of maintaining actual scan support files, we maintain tables that reference the I frames in the original version of the movie and only create the P and B frames in-between the two. Thus, for a file to be compressed at 2X, table entry for the new file will point to alternate I frames, and P and B frames could be predicted between these I frames. Since I frames are approximately twice the size of P frames and typically five times the size of B frames8, such a technique leads to a drastic reduction in the amount of extra disk space required. 5. PLAYOUT OPERATION ANDTRANSITIONS The video server keeps transmitting the original MPEG movie stream until it receives a scan request from the client. On receipt of such a request, the server checks to see if it has the scan file at the rate requested. If it has, the serversimply offsets to the correct location in that file and starts transmitting from there until it receives some other request from the client. However, since the client uses a dial for scan requests, it is possible (and likely) that the client will request a scan rate for which the server does not have any scan support file. In situations where the server cannot directly transmit from a specified file, the video server creates a rate/compression level on the fly that corresponds as closely as possible with the actual rate the client requested. For this purpose, the server selects the scan support file whose playout rate is the greatest available rate less than the requested rate. Therefore, if the client requests a 5X rate, the server uses the 4X file. Instead of transmitting all the frames from this file, the server instead drops frames in a way so that the effective rate is approximately the same as that requested by the client. The server cannot randomly drop frames. Since I frames are the information frames and both P and B frames depend on it, it is not possible to drop them unless we drop two consecutive groups of pictures. This can be done when the scan rate requested is extremely high; however at low rates, this will lead to large holes in the movie making it difficult for the viewer to seek to its desired location. P frames can only be dropped when all successive P and B frames until the next I frame have also been dropped. B frames are interpolated from I and P frames, however no frames depend on B frames. Therefore, B frames can be dropped without having to drop any extra frames. On receiving a scan request from a client for which the server does not have a scan file, the server computes the percentage of frames that need to be dropped to achieve the effective scan rate that the client is requesting. Therefore, if the requested rate is 9X, the server computes that approximately 1 1 % of the frames need to be dropped in every GOP. It next checks to seeif it is possible to achieve the percentage drop rate by just dropping B frames. If it is possible, then B frames are dropped uniformly across the GOP. If the required rate cannot be achieved by this technique, then in addition to the B frames, the P frames are dropped as well starting from the rightmost P frame in a GOP and this is repeated if necessary. An entire GOP can be dropped as well if the required rate is still not satisfied. Dropping of frames in this manner does not involve a lot of processing since every frame is clearly demarcated by start codes that can be located easily. In this way, the server adapts the scan rate to the rate requested by the client. Since the hierarchy always contains a file that has a playout rate twice that of the current file, the maximum frame drop rate is 50%, which cart nearly always be accommodated using B frames only. In the frame pattern distribution discovered by Acharya and Smith8, only content recorded using the I, ILP, and IPPP frame patterns 197

could not accommodate 50% drop rates in B frames only and these easily accommodate the drop rate using P or I frames directly. Highly sophisticated encoders tend to use a greater percentage of B frames. Set of frames to drop to discard a P frame IBBPBBPBBPBBPBBIBBPBBPBBPBBPBBI 1' Set of frames to drop to discard an I frame Figure 4: Dropping frames An additional way to adapt the scan rate would be to add frames to a scan file that is compressed more than what is requested by the client. The server can repeat B frames to achieve the requested scan rate or the server could create additional B frames and P frames using prediction algorithms. Predictive frame interpolation algorithms are time-intensive and unlikely to achieve real-time performance. Adding B frames is simple, but can yield a jerky presentation. It would also be easy to modify the protocol to indicate that the client should playout slightly slower. Decreased playout is useful when the client requests a rate such as 3 1X. In such a case, instead of dropping a large percentage of frames from the 16X file, we couldadd some frames to the 32X file. Adding B frames, which are the smallest frame units, actually decreases the baiidwidth requirement for playout, since the file is transmitted slower and has a lower average bits-per-frame ratio. A combination of the two techniques: dropping frames and adding frames provides a way in which the server can adapt the scan rate regardless of what scan files it maintains. It also helps in removing some scan files and conserving disk space as well. We only tested frame drop-based solutions in our implementation. As an example, assume a 3Mbps base video file. In order to support fast-forward operations, VVD stores the base video file and a hierarchy of 2X, 4X, 8X, etc. (Experiments have limited fast-forward speeds to 64X). The 2X scan support file need not be stored at the same bit-rate, since the quality during scans is less important. For this example, assume scan support files are encoded at 1Mbps. The decoder must be modified to accommodate the switching bit-rates, but this is a simple modification. Because the bit-rate is 1/3 that of the base stream and half the length, the 2X scan support file requires 1/6 the storage. The next levels of the hierarchy require 1/12, 1/24, etc. With this structure, the scan support files for forward scan require less than 33% overhead. If a 1Mbps reverse scan support file is assumed, the total overhead for forward and reverse scan support is about 100%. Sharing of I frames among the scan support files can further reduce that overhead. When B frames are dropped in order to increase the playout speed of a video sequence, the bit-rate increases. Since B frames are the smallest and most efficient unit of picture storage, dropping B frames negatively impacts the average compression ratio for the sequence. The worst case of 50% frame drops and 0-length B frames would double the frame rate. In practical systems, B frames are about 0.5 of the average frame size, so dropping 50% of frames would drop about 25% of the data for a bandwidth increase of 50% during 2X playback. In VVD, this is only an issue for playout in the range 1X to 2X, since all remaining streams are compressed at lower bitrates and do not exceed 1.5Mbps for this example. For playout in the 1X to 2X ranges, the audio content is discarded, allowing from a decrease in data requirements of from 10-20%, but the bit rate can still exceed 4Mbps. Solutions for this problem include disallowing scan speeds from about 1.5X to 2X, providing an additional scan support file at 1.5X, or only allowing 2X or greater scans. When compared to conventional VCR controls, the limiting of scan support in the 1.5X to 2X range would be a minimal inconvenience. 6. CONCLUSIONS MPEG is a common standard for the transmission of audio/video information. Since it is a high-quality and efficient compression algorithm, it is likely to be used by video servers for movies, educational materials, and other Video on Demand applications. With the advent of HDTV, which is based on MPEG compression, viewers will have MPEG decoders in their televisions and its use, along with technologies such as WebTV, will increase the popularity of VoD not only on cable but over the Internet as well. With rapidly declining memory prices and the arrival of DVD players that can store nearly 100GB of data, the overhead incurred by VVD might well be more than offset by the flexibility that it provides at providing varying 198

rate scan operations in real time. VVD can also be used in tandem with dynamic service aggregation3 to provide a complete solution for video on demand over limited bandwidth networks. Some significant questions remain open about this technology. Future work will examine the minimum possible overhead for the VVD movie hierarchy and the most efficient means of transitioning among streams. The latency of transmitting scan requests, buffering video responses, and presenting the result does slow the appearance of the operation, though not as much as in some common tape machines. The transfer of some of the VVD operation to the client can help alleviate this latency by allowing operations to appear immediate. REFERENCES 1. P. 3. Shenoy, H. M. Vin, "Efficient support for interactive operations in multi-resolution video servers" ACM Multimedia Systems Journal, 1999 (to appear). 2. M. Chen, D. D. Kandlur, and P. S. Yu, "Support for fully interactive playout in a disk-array-based video server," Proceedings ofacm Multimedia 1994. pp. 391-398, October 15-20, 1994, San Francisco, CA. 3. P. Basu, A. Narayanan, R. Krishnan, T. D.C. Little, "An implementation of dynamic service aggregation for interactive video delivery," Proceedings ofmultimedia Computing and Networking 1998, pp. 1 1 1-122, January 26-28, 1998, San Jose, CA. 4. J. K. Dey-Sircar, J. Salehi, J. F. Kurose, D. Towsley, "Providing VCR capabilities in large-scale video servers", Proceedings ofacm Multimedia 1994, pp. 25-32, October 15-20, 1994, San Francisco, CA. 5. Information technology - Coding ofmoving pictures and associated audiofor digital storage media at up to about 1.5Mbit/s - Part 2: Video, International standard ISO/IEC 11172-2. 6. D. Le Gall, "MPEG: A Video Compression Standard for Multimedia Applications", Communications ofthe ACM, 34(4):46-58, April, 1991. 7. L. A. Rowe, K. D. Patel, B. C Smith, K. Liu, "MPEG video in software: Representation, transmission, and playback", In Proceedings ofhigh speed networking and multimedia computing, February, 1994. 8. 5. Acharya, B. Smith, "An experiment to characterize videos stored on the web", Proceedings of SPIE Multimedia Computing and Networking 1998, pp 166-178, January 26-28, 1998, San Jose, CA. 9. K. L. Gong and L. A. Rowe, "Parallel MPEG-i video encoding," Proceedings of 1994 Picture Coding Symposium, Sacramento, CA, September, 1994. 199