Stream Conversion to Support Interactive Playout of. Videos in a Client Station. Ming-Syan Chen and Dilip D. Kandlur. IBM Research Division

Similar documents
Storage and Retrieval Methods to Support Fully Interactive. Playout in a Disk-Array-Based Video Server

VVD: VCR operations for Video on Demand

8 Concluding Remarks. random disk head seeks, it requires only small. buered in RAM. helped us understand details about MPEG.

Network. Decoder. Display

Supporting Random Access on Real-time. Retrieval of Digital Continuous Media. Jonathan C.L. Liu, David H.C. Du and James A.

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Chapter 10 Basic Video Compression Techniques

Implementation of MPEG-2 Trick Modes

Pattern Smoothing for Compressed Video Transmission

Understanding Compression Technologies for HD and Megapixel Surveillance

Relative frequency. I Frames P Frames B Frames No. of cells

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Content storage architectures

DCT Q ZZ VLC Q -1 DCT Frame Memory

Frame Compatible Formats for 3D Video Distribution

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

MPEG has been established as an international standard

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

MSB LSB MSB LSB DC AC 1 DC AC 1 AC 63 AC 63 DC AC 1 AC 63

Motion Video Compression

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

A Real-Time MPEG Software Decoder

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

Analysis of MPEG-2 Video Streams

A Video Frame Dropping Mechanism based on Audio Perception

MULTIMEDIA TECHNOLOGIES

Scalable Foveated Visual Information Coding and Communications

Digital Television Fundamentals

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Analysis of Retrieval of Multimedia Data Stored on Magnetic Tape

Digital Media. Daniel Fuller ITEC 2110

Coded Channel +M r9s i APE/SI '- -' Stream ' Regg'zver :l Decoder El : g I l I

Seamless Workload Adaptive Broadcast

How Does H.264 Work? SALIENT SYSTEMS WHITE PAPER. Understanding video compression with a focus on H.264

(a) (b) Figure 1.1: Screen photographs illustrating the specic form of noise sometimes encountered on television. The left hand image (a) shows the no

Understanding Multimedia - Basics

OPEN STANDARD GIGABIT ETHERNET LOW LATENCY VIDEO DISTRIBUTION ARCHITECTURE

Bridging the Gap Between CBR and VBR for H264 Standard

16.5 Media-on-Demand (MOD)

Lecture 23: Digital Video. The Digital World of Multimedia Guest lecture: Jayson Bowen

A low-power portable H.264/AVC decoder using elastic pipeline

Lecture 2 Video Formation and Representation

Dual frame motion compensation for a rate switching network

Video Sequence. Time. Temporal Loss. Propagation. Temporal Loss Propagation. P or BPicture. Spatial Loss. Propagation. P or B Picture.

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Video coding standards

Using Software Feedback Mechanism for Distributed MPEG Video Player Systems

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

The transmission of MPEG-2 VBR video under usage parameter control

A variable bandwidth broadcasting protocol for video-on-demand

An Overview of Video Coding Algorithms

New forms of video compression

Digital Video Telemetry System

HEVC: Future Video Encoding Landscape

Interframe Bus Encoding Technique for Low Power Video Compression

Multimedia Communications. Image and Video compression

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

Adaptive Key Frame Selection for Efficient Video Coding

An Interactive Broadcasting Protocol for Video-on-Demand

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Cost Analysis of Serpentine Tape Data Placement Techniques in Support of Continuous Media Display

Interlace and De-interlace Application on Video

Real-Time Parallel MPEG-2 Decoding in Software

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

Multimedia Time Warping System. Akiko Campbell Presentation-2 Summer/2004

REGIONAL NETWORKS FOR BROADBAND CABLE TELEVISION OPERATIONS

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Understanding IP Video for

Multimedia Communications. Video compression

Chapter 2 Introduction to

Bit Rate Control for Video Transmission Over Wireless Networks

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Case Study: Can Video Quality Testing be Scripted?

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Implementation of an MPEG Codec on the Tilera TM 64 Processor

The Matched Delay Technique: Wentai Liu, Mark Clements, Ralph Cavin III. North Carolina State University. (919) (ph)

Abstract WHAT IS NETWORK PVR? PVR technology, also known as Digital Video Recorder (DVR) technology, is a

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

RECOMMENDATION ITU-R BT.1203 *

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

(12) Patent Application Publication (10) Pub. No.: US 2007/ A1

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

FRAMES PER MULTIFRAME SLOTS PER TDD - FRAME

Software Quick Manual

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Introduction to image compression

Color Image Compression Using Colorization Based On Coding Technique

Using the VideoEdge IP Encoder with Intellex IP

Video-on-Demand. Nick Caggiano Walter Phillips

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Transport Stream. 1 packet delay No delay. PCR-unaware scheme. AAL5 SDUs PCR PCR. PCR-aware scheme PCR PCR. Time

CHECKPOINT 2.5 FOUR PORT ARBITER AND USER INTERFACE

Chapter 2. Advanced Telecommunications and Signal Processing Program. E. Galarza, Raynard O. Hinds, Eric C. Reed, Lon E. Sun-

(12) Patent Application Publication (10) Pub. No.: US 2004/ A1


Transcription:

Stream Conversion to Support Interactive Playout of Videos in a Client Station Ming-Syan Chen and Dilip D. Kandlur IBM Research Division Thomas J. Watson Research Center Yorktown Heights, New York 10598 fmschen, kandlurg@watson.ibm.com Abstract In this paper we address the problem of supporting interactive playout of an MPEG encoded video stream at a player device. We propose a method for transforming the standard MPEG stream to a local form at the player device, which then enables the device to support ecient interactive playout, specically for backward playout, even when the buer space available is constrained. Explicitly, we devise a stream conversion scheme that encodes P frames as I frames after the decompression and playout of each P frame. Such a scenario of transforming a P frame to an I frame is termed P-I conversion. Note that using the technique of P-I conversion, backward playout will not require any additional memory buer than normal forward playout, thus avoiding the large buer requirement that would be needed for backward playout in the original MPEG stream because of interframe dependencies. It will be seen that since this P-I conversion is performed after a P frame is decompressed and played out, there is no extra cost required for decoding. Also, since there is no motion estimation and compensation required for compressing a single frame into an I frame, this I frame encoding is done very eciently. We have evaluated several potential interactive playout methods using video experiments. P- I conversion is shown to be cost-eective, easy to implement and able to provide high visual quality interactive playout, and is therefore deemed a viable approach to supporting interactive playout for MPEG video in a client station. Index Terms Stream conversion, downloading, interactive playout, MPEG, video-on-demand.

1 Introduction Recent advances in technologies such as computing, storage, and communication, have made possible the creation of several exciting multimedia applications [6, 11]. Compression techniques play a key role in processing digital multimedia data, particularly for video data. There are several factors which drive the use of compressed video data: (1) the prohibitively large storage required for uncompressed video data, (2) relatively slow storage devices that are unable to retrieve video data for real-time playout unless the data is compressed, and (3) network bandwidth that does not allow real-time video transmission for uncompressed data. For example, a single uncompressed color frame with 620560 pixel and 24 bits per pixel will require approximately one Mbyte of storage. At a full-motion playout rate of 30 frames per second, a 30-minute video would require more than 50 Gbytes of storage. Currently, the prevalent standard for the compressed video/audio is ISO MPEG [1]. Inter-frame compression techniques provided by MPEG render signicant advantages in storage and transmission. In order to facilitate storage and retrieval, the MPEG standard denes a compressed stream whose rate is bounded. Interactive TV and video-on-demand (VOD) have been identied as two important services made possible by advances in video compression and network transmission technologies [4]. In a VOD system, multimedia streams are stored on a storage server (the video server) and played out to the user station upon request. A signicant amount of eort has been elaborated upon the design of a video server [7, 8, 9, 10, 12, 14]. A VOD server is expected not only to concurrently serve many clients (hundreds or more), but also to provide many interactive features for video playout, such as pause/resume, backward play, and fast-forward and fast-backward play, which home viewers have come to expect from their current VCR systems. However, recent studies indicate that to meet these requirements, the server would need a tremendous amount of computing power, storage, and communication bandwidth [2, 5]. As pointed out in [2], the inter-frame dependencies of MPEG make it prohibitively expensive to provide some interactive features over the network. Furthermore, such factors as skewed movie requests and peak-hour activities have made it very dicult, if not impossible, to have a cost-eective resource allocation (in terms of CPU, storage and network bandwidth) in a VOD system. Consequently, the feasibility of providing interactive movie viewing over the network (including backbone and cable networks) is unclear. To avoid the above drawbacks, we consider in this paper an alternative solution for the movieon-demand service. This solution involves downloading of the video data into the storage of a player device located at the customer premise, so that the customer can view the video subsequently without further intervention from the network. It can be shown that such a player device is economically 1

video player device video server video player device video server Backbone and community network archive Figure 1: An illustration for video servers and client stations. feasible. Using the MPEG technique, a 100 minute MPEG-1 movie will occupy 1{1.5 GBytes of storage. Currently, the price of a 1.6 GByte disk is slightly under $400, and it is expected that this cost will reduce rapidly in the years to come due to rapid increases in recording densities, thus falling into the price range of consumer products. With the current disk bandwidth (e.g., a SCSI disk), downloading a 100 minute MPEG-1 movie from the remote video server over the network to the disk of a client station will take only 3 to 5 minutes, close to the time for TV commercial breaks that is generally acceptable for the end viewers. With video data stored in the player's storage, viewers can then enjoy all the interactive features for video viewing without incurring any server resources and network bandwidth. In addition, since downloading can be done prior to viewing, the eects of skewed movie requests and peak-hour activities can be minimized. Figure 1 illustrates an environment for a movie-on-demand system that has a video player device at the customer premises. In this environment, video data is stored in the video server and transmitted to the player device upon request. The transmission may occur at high speed so as to permit downloading of the entire movie within a few minutes of elapsed time. Note that under the downloading model, the VOD server is able to transmit videos out as fast as possible without explicit pacing. Also, since this is not real-time playout, the server has the exibility of queuing up requests for downloading. Based on these trends, we believe that downloading digital movies into a player device at the customer premises for viewing is economical and desirable, and will be a competitive choice to real-time VOD. 2

Given this downloading model, it is important to enhance the product competitiveness of such a player device by minimizing its system resource requirement. It is noted that, even with this device, one still encounters a deciency for interactively playing MPEG movies, which arises in backward playout (and also in fast-backward playout). As will be described in detail later, due to the interframe dependency of MPEG, backward playout requires a large amount of buer memory to store many decompressed frames for decoding a frame 1. Since the video player, as with most consumer products, is a very price-sensitive component, such a requirement for large memory buers would be highly undesirable for product competitiveness. To address this issue, we propose in this paper an ecient method which uses stream conversion to support interactive playout for MPEG videos and minimizes the buer requirement in the player device. In essence, we seek to employ a compressed stream that is compatible with the standards for video data distribution, and locally enhance it for special eects. The standard stream is dened to be highly compressed, so as to minimize the cost of distribution, such as storage and network transmission, whereas the local stream is optimized for eective playback. While the concept of transforming a standard compressed stream into a local stream can be realized in many ways, we explicitly utilize this concept to devise a scheme that encodes P frames as I frames after the decompression and playout of each P frame, so as to eciently support interactive playout for MPEG videos. Such a scenario of transforming a P frame to an I frame is termed P-I conversion in this paper. A detailed description of MPEG data organization is given in Section 2. It is noted that since this P-I conversion is performed after a P frame is decompressed and played out, there is no extra cost required for decoding. More importantly, since there is no motion estimation and compensation required for compressing a single frame into an I frame, this I frame encoding can be done very eciently. As will be seen later, using the technique of P-I conversion, backward playout will not require any additional memory buer than normal playout, thus avoiding the requirement for large buer that would be needed for backward playout in the original MPEG stream because of interframe dependencies. It is understood that one could transform all P and B frames into I frames, i.e., converting MPEG to JPEG [15], after their playout. However, P-I conversion is preferable to the MPEG- JPEG conversion in that it requires a much smaller amount of secondary storage in the player device and still enables backward playout to be operated with the minimal amount of buer required, i.e., the amount of buer required for MPEG forward playout. We shall describe the method 1 It should be pointed out, nevertheless, that such a deciency is not unique to the downloading model, and in fact also exists for real-time VOD. 3

of P-I conversion and its implementation in detail, and provide the corresponding cost analysis. Since the backward operation is mostly employed in the form of fast-backward playout, several video experiments are conducted to evaluate various types of fast-backward playout and show the advantage of using P-I conversion. Among the cases evaluated, it is shown that the interactive playout achieved by an approach using P-I conversion is, visually, most acceptable. It is worth mentioning that most prior work on VOD mainly focused on the resource management and task scheduling in the video server [7, 8, 9, 10, 12, 14], there is little work reported on either discussing the provisions required for the downloading model or dealing with stream conversion to facilitate interactive playout in the client station. This feature distinguishes this work from others. It is noted that video playback by software was dealt with in [13]. However, the work in [13] does not modify the MPEG sequence, and consequently the decoder may have more states (for reference frames) while performing backward playout. Overall, P-I conversion is shown to be cost-eective, easy to implement and able to provide high visual quality interactive playout, and is therefore deemed a viable approach to supporting interactive playout for MPEG video in a client station. This paper is organized as follows. Section 2 describes the MPEG data organization. The method of stream conversion is presented in Section 3. Some visual experiments are conducted in Section 4. Section 5 summarizes our results. 2 MPEG Data Organization The structure of the MPEG stream imposes several constraints on the video data storage and playout. An MPEG video stream consists of intra frames (I), predictive frames (P), and interpolated frames (B). In this stream, I frames are coded such that they are independent of any other frames in the sequence, and P frames are coded using motion estimation and have a dependency on the preceding I or P frame. On the other hand, B frames depend on two \anchor" frames: the preceding I/P frame and the following I/P frame. Since P and B frames use inter-frame compression, they are substantially smaller than I frames. Figure 2 shows the inter-frame dependencies in a sequence of MPEG frames, where the frames are numbered in temporal order. The arrows indicate the dependencies between frames. The inter-frame dependency implies that it is not possible to decode a P frame without the preceding I or P frame. Similarly, it is not possible to decode a B frame without the corresponding two anchor frames (i.e., two P frames, or one I and one P frames). Figure 3 shows the dierences between the order in which compressed frames are presented to the decoder (presentation order) and the order in which decompressed frames are presented to the viewer (temporal order). It can be seen that for normal forward playout, it is necessary to keep exactly two decompressed frames in the memory buer for decoding a frame that references these 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 I B B P B B P B B P B B P B B I Figure 2: Inter-frame dependency in a sequence of MPEG frames. Temporal Order: I B B P B B P B B P B B P B B I... Frame Number: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Presen./Storage Ord.: I P B B P B B P B B P B B I B B... Frame Number: 1 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15 Figure 3: Temporal order (to the viewer) and presentation order (to the decoder) of MPEG frames. two frames. For the example in Figure 3, decompressed frames 1 (I frame) and 4 (P frame) are required to decode frame 2 (B frame). On the other hand, when decoding frame 5 (B frame), we need decompressed frames 4 and 7 (two P frames), and do not need frame 1 anymore. Since decompressed frames are of the same size, we need buer space for two decompressed frames to do the decoding for normal playout. While this presentation order reduces the buer space required for forward playout, it does not address the problem of backward playout. Note that since frames are encoded using forward prediction, in order to display a particular frame it is necessary to decompress a large number of preceding frames on which this frame may be dependent. These decompressed frames are large and 5

they increase the memory requirement of the decoder substantially. Moreover, the number of such buers required increases linearly with the length of the chain of predicted frames. Consequently, to resolve this deciency of MPEG, we shall present in the following section an approach of encoding P frames as I frames after the decompression and playout of each P frame. As will be seen later, such a P-I conversion will signicantly reduce the buer requirement of a player device and facilitate the interactive playout in the client station. 3 Compressed Stream Conversion for MPEG Videos In this section, we shall describe a method to transform a standard compressed video stream to a local form at the player device. We present the method in the form of a P-I conversion for an MPEG stream, with the understanding that similar conversions can be devised for other video compression algorithms that utilize motion compensation. The basic idea of P-I conversion is described in Section 3.1 and its implementation is given in Section 3.2. 3.1 MPEG Stream Conversion As explained earlier, although the presentation sequence of MPEG obviates the need for storing compressed frames during forward playout, it does not address the problems of backward playout. For example, consider the case that a viewer decides to play backward when he is viewing frame 14 in Figure 3 (at that moment we have decompressed frames 13 and 16 in the buer). He can then view frame 13. However, to decode frame 12, the decoder needs decompressed frames 10 and 13. To obtain decompressed frame 10, the decoder needs decompressed frame 7, which in turn requires decompressed frame 4 and frame 1. Thus, to decode a frame P during the backward playout, it is necessary to decode, in a reverse sequence, all the P frames until an I frame is reached. Note that this reverse chained-decoding is required for backward playout, but not for forward playout, since a P frame is encoded based on the previous I/P frame. The buer space required for backward playout thus increases signicantly (we need buer space for 5 decompressed frames in this case). In order to facilitate backward playout, we propose a transformation of the standard MPEG stream into a local compressed form. Specically, after a P frame is retrieved, decompressed and played out, we encode this frame as an I frame and store it back to the secondary storage. As mentioned before, since this P-I conversion is performed after a P frame is decompressed and played out, there is no extra cost required for decoding. Also, since there is no motion estimation and compensation required for compressing a single frame into an I frame, this I frame encoding can be done very eciently. Figure 4 shows a snapshot for the compressed frames stored in the secondary storage when the normal playout reaches frame 14 (when we keep decompressed frames 13 and 16 6

Orig. Frames Stored: I P B B P B B P B B P B B I B B... Frame Number: 1 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15 Frames Stored After P/I Conv: I I B B I B B I B B P B B I B B... Frame Number: 1 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15 Figure 4: A snapshot for the result of P-I conversion. in the buer and frame 13 is yet to be converted into an I frame). It is important to see that using the proposed P-I conversion, the buer space required for backward play will be the amount for storing two decompressed frames, i.e., the same as required for forward play. For example, consider again the case where a viewer decides to play backward when he is viewing frame 14 (with decompressed frames 13 and 16 in the buer). He next views frame 13, and is then able to view frame 12 which is decoded based on frames 10 and 13. Note that with P-I conversion, frame 10 is now stored as an I frame in the secondary storage, and can be retrieved and decompressed by itself to be used for decoding frame 12. The reverse chained-decoding required for the backward playout in the original MPEG stream is thus avoided. After the P-I conversion, the temporal order (to the viewer) and presentation order (to the decoder) for backward playout are shown in Figure 5. It can be shown that the additional storage required due to this P-I conversion is small. Assuming that the ratio of frame sizes for I, P and B frames is 5:3:2, Table 1 shows the additional secondary storage required for the P-I conversion and the MPEG-JPEG conversion (in terms of the percentage over the size of the original MPEG stream) for various MPEG streams. The frame ratio in Table 1 indicates the mix of I, P and B frames in the MPEG stream. For instance, the MPEG stream in Figure 2 has a frame ratio of 1:4:10. Consider this stream as an example. Since the ratio of frame sizes for I, P and B frames is 5:3:2, the percentage of size increase by P-I conversion will be 55+210?(51+34+210) = 21.6%. On the other hand, the percentage of size increase by 51+34+210 MPEG-JPEG conversion for this stream will be 515?(51+34+210) = 102.7%. Following the same 51+34+210 procedure, the numbers in other columns can be obtained. It can be seen that while requiring the same amount of buer for decoding 2, P-I conversion requires a signicantly smaller amount of secondary storage than MPEG-JPEG conversion. From Table 1, it is seen that the additional secondary storage required by P-I conversion is small and fairly acceptable in view of the resulting 2 It is noted that a player device for MPEG-JPEG conversion also requires the buer for two decompressed frames to handle an incoming standard MPEG stream. 7

Temporal Order: I B B I B B I B B I B B I B B I... Frame Number: 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1... Presentation Order: I I B B I B B I B B I B B I B B... Frame Number: 16 13 15 14 10 12 11 7 9 8 4 6 5 1 3 2 Figure 5: Temporal order (to the viewer) and presentation order (to the decoder) for backward playout. frame ratio (I:P:B) 1:3:8 1:3:12 1:4:10 1:4:15 P-I conversion 20.0% 15.7% 21.6% 17.0% MPEG-JPEG conv. 100.0% 110.5% 102.7% 112.7% Table 1: Additional secondary storage required by P-I conversion and MPEG-JPEG conversion. benets and the inexpensive nature of the secondary storage. 3.2 Implementation In this subsection, we shall show that the proposed P-I conversion method is easy to implement. To illustrate the apparatus required to implement this transformation, Figure 6 shows a block diagram of a player device. This player contains an input device, a temporary storage device, video/audio decoder, encoder, control logic, buer memory, and display logic. The input device may be a network interface, a CD-ROM reader, or some such device, and it is used for reading an incoming standard video stream. The temporary storage device, which is a read/write device, is used to store the transformed video stream. The control logic implements the dierent operations on the device, while the conventional display logic is used to display images from the buer. It is worth mentioning that depending on the applications, a downloading process could be performed in a pipelined manner so as to further reduce the local storage required, in which case, however, additional provision is needed to ensure the execution of interactive functions. The execution ow for decoding during the normal playout is shown in Figure 7, where the player reads consecutive video frames, decodes them, and displays them. The decoder operations are determined by the frame type, and they rely on two decompressed \anchor" frames. If the frame is an I frame, it is decompressed without depending on any other frame. The decompressed frame is retained in the buer memory as an anchor frame for successive frames. If the frame is a P frame, it is decompressed using the preceding anchor frame. This decompressed frame is also retained in the buer memory as an anchor frame. In this gure, the dotted line indicates the extra operations 8

Input (network or CD ROM) Decoder Encoder Buffer Control logic Secondary storage Display logic Output display Figure 6: A player device with the capability of transforming a standard compressed stream into its local form. required to convert P frames to I frames. The decompressed frame is now compressed as an I frame and stored back to the secondary storage. Since this P-I conversion is performed after a P frame is decompressed and played out, it does not impose any additional cost/delay on the decoder. Also, since the encoding process does not require any CPU intensive motion search/estimation, it can be performed easily in real-time. It is noted that we can not perform an in-place editing of P frames to I frames, since an I frame is in general larger than a P frame. Instead, one can copy a group of pictures (GOP), modify it, and then store it back. The original GOP can then be deleted. The execution ow for decoding during the backward playout is shown in Figure 8, where the player reads successive video frames, decodes them, and displays them. The order of frame retrieval for backward playout is the reverse retrieval order for forward playout. As in forward playout, frames are presented to the decoder in an order which is dierent from the temporal order. For example, in Figure 5, frame 13 is decoded before frame 15 since frame 13 is an anchor frame that is required for decoding frame 15. However, frame 15 is presented (displayed) before frame 13. Since P frames are replaced by I frames during forward play, the only frame types encountered during backward play are I and B frames. These frames are decoded and displayed in a manner that follows the description above for forward playout. Note that it is, in general, necessary to create osets to I frames to access them eciently during backward playout. However, such a need 9

Start more frames? Yes No Stop P frame type? B I Referencing the prior I/P frame for decoding Referencing the two anchors for decoding Decode this I frame by itself Play out this decompressed P frame Retain decompressed frame as an anchor frame Encode this frame as an I frame Play out this decompressed B frame Play out this decompressed I frame Retain decompressed frame as an anchor frame Store compressed I frame in the secondary storage Figure 7: Execution ow for the decoder during the forward playout. 10

Backward play more frames? Yes No Stop B frame type? I Referencing the two anchors for decoding Decode this I frame by itself Play out this decompressed B frame Play out this decompressed I frame Retain decompressed frame as an anchor frame Figure 8: Execution ow for the decoder during the backward playout. 11

does not arise from the use of P-I conversion. It can be seen that in the original MPEG stream, creation of osets to I frames is also needed for ecient frame access during backward playout. In addition, it is noted that although the GOP size changes in the modied stream, an MPEG decoder can still operate eectively. In addition to the stream level, an MPEG decoder can be kept informed of the picture pattern in the GOP level. Hence, depending upon the type of frame to be processed, an MPEG decoder can take appropriate actions for intraframe or interframe decoding. As such, an MPEG decoder is able to handle dierent GOP sizes in one video stream. 4 Visual Experiments for Fast-Backward Playout In our opinion, the backward operation will be used mainly the form of fast-backward playout 3. In light of this, we would naturally like to know the impact of P-I conversion to the visual eect of fast-backward (FB) playout. For comparison, we assume that the player device only has the amount of buer for MPEG forward playout. The chained-decoding scenario described before for FB playout for the standard MPEG, which requires a much larger amount of buer, is deemed unfavorable and not included here for comparison. Specically, we consider the following three types of FB playout. 1. type FB. Without downloading, we use the segment skipping method devised in [2] to implement FB playout. 2. type FB. Without stream conversion, we only play I frames during the FB playout. This method can be used either with downloading or without downloading. 3. type FB. With P-I conversion, an FB playout is performed using the backward playout technique described in Section 3. Illustrative examples for these three types of FB are shown in Figure 9. Independent of their visual eects, it is noted that while and types of FB can provide variable FB speeds, type of FB is restricted to certain FB speeds. More specically, the FB speedup that type FB can provide has to be a multiple of the number of frames an I frame is associated with (i.e., 15, 30, 45, etc. in Figure 9b). Also, when type FB is performed over the network, it has to be operated either at a higher cost (i.e., higher data rate) or at a slower frame rate since the size of an I frame is larger than the average size of a combination of I, P and B frames. In order to perform this visual experiment, we used the MMT (Multimedia Multiparty Teleconferencing) system, a prototype desktop collaboration system developed at the IBM Thomas J. Watson Research Center [3]. The current MMT hardware consists of an IBM PS/2 computer equipped with two custom-built 3 This is consistent with the fact that most commercial VCR's only provide the fast-backward play option. 12

i-1 i i+1 i+2 i+3 i+4 i+5 i+6 skip skip skip skip skip play play play IBBPBBP...B IBBPBBP...B IBBPBBP...B (a) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 I B B P B B P B B P B B P B B I B B P B B P B B P B B P B B I B B P B B P B B P B B P B B I (b) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 I B B B B B B B B B B I B B B B B B B B B B I B B P B B I B B I B B I B B I I I I I I I I I (c) Figure 9: Three types of FB playout evaluated: (a) type (FB speedup: 3; starting from segment i + 6), (b) type (FB speedup: 15, and (c) type (FB speedup: 4). 13

CPU system memory microchannel bus comm adapter CODEC adapter VAC adapter Ethernet Network audio video in/out Figure 10: The MMT System. adaptors. One adaptor (VAC) interfaces with analog video and audio components and performs capture (digitization) and playback functions. The other adaptor (MMT) is for compression, networking, processing, and decompression of video and audio. Figure 10 shows one of the possible data paths through the MMT system. In this scenario, the video application extracts video and audio data from the MMT adaptor and transports it across the network. This video application has been modied to (a) store the video data to a le, and (b) playout video data from a le. In the rst instance, the timing/pacing for the recording process is provided naturally by the video source and data is generated whenever a new video frame is captured and compressed. However, in the case of stored video playout, no suitable timing signal is available from the operating system to pace the playout process. We therefore chose to exploit the video source to provide the pacing for the playout. The playout application congures the MMT adaptor for the capture and playout of video. It then interprets the incoming video data to demarcate frames and paces the playout by ensuring that the number of video frames played out to the MMT adaptor closely corresponds to the number of video frames captured. We developed a lter program that processes the stored video to produce an output le with the appropriate segment size and speedup. This lter represents an o-line implementation of the FB mechanism. 14

Using these tools, we have experimented with these three types of FB playout for various video clips. The dierences among these three types of FB are dependent on the video content. For video clips with very little motion, such as those showing weather maps and news announcements, the visual eect of these three FB methods is similar. However, for video clips containing objects moving toward one direction, such as track actions, swimming, and long distance touch-down runs in a football game, the dierences among these FB methods becomes very prominent. We perceived some zig-zag motion for type of FB, i.e., objects appear to be moving back and forth. Also, we observed that watching type FB is similar to watching a slide projector operating at a high speed. On the other hand, type FB is very smooth and found to be most visually acceptable, showing a better visual quality resulting from P-I conversion. Note that since the objective of our experiments is to determine the visual impact of dierent types of FB, and not the eciency of the compression, using JPEG (in MMT) as the compression format will achieve the same visual eect. Also, the MMT provides us with facilities to capture various interesting scenes from TV programs for evaluation. Note that these visual experiments may also be performed using software MPEG decoders, such as the one from the University of California at Berkeley [13]. 5 Conclusions In this paper we have explored the notion of a consumer player device to provide interactive video functions for digital video streams. The model we considered involves downloading of the video data into the storage of a player device located at the customer premise, so that the customer can view the video subsequently without further intervention from the network. In this context, we proposed the concept of stream conversion to support the interactive playout, both forward and backward, of MPEG encoded video streams. This stream conversion results in a substantial reduction of memory buer requirement in the player device. We have described this method of P-I conversion and its implementation, and provided the corresponding cost analysis. We also conducted some video experiments to evaluate various types of FB playout that illustrate the advantages of using P-I conversion. Based on its cost-eectiveness, ease of implementation, and its capability for providing high visual quality, we believe that P-I conversion is a viable approach to supporting interactive playout for MPEG video in a client station. References [1] Coding of moving pictures and associated audio { for digital storage media at up to about 1.5Mbit/s, May 1994. 15

[2] M.-S. Chen, D. D. Kandlur, and P. S. Yu. Support for Fully Interactive Playout in a Disk- Array-Based Video Server. Proceedings of ACM Multimedia, pages 391{398, October, 1994. [3] M.-S. Chen, Z.-Y. Shae, D. D. Kandlur, T. P. Barzilai, and H. M. Vin. A Multimedia Desktop Collaboration System. In Proceedings GLOBECOM 92, pages 739{746, December 1992. [4] D. Deloddere, W. Verbiest, and H. Verhille. Interactive Video on Demand. IEEE Communications Magazine, 32(5):82{88, May 1994. [5] J. K. Dey, J. D. Salehi, J. F. Kurose, and D. Towsley. Providing VCR capabilities in large-scale video servers. In Proc. ACM MULTIMEDIA'94, pages 25{32, October 1994. [6] W. I. Grosky. Multimedia Information Systems. IEEE Multimedia, pages 12{24, Spring, 1994. [7] D. D. Kandlur, M.-S. Chen, and Z.-Y. Shae. Design of a Multimedia Storage Server. In Proc. IS&T/SPIE Symposium on Electronic Imaging { Conference on High speed networking and Multimedia Applications. SPIE, February 1994. [8] T. Mori, K. Nishimura, H. Nakano, and Y. Ishibashi. Video-on-demand system using optical mass storage system. Japanese Journal of Applied Physics, 1(11B):5433{5438, November 1993. [9] R. T. Ng and J. Yang. Maximizing Buer and Disk Utilizations for News on-demand. In Proceedings of the 20th International Conference on Very Large Databases, pages 451{462, September 1994. [10] B. Ozden, A. Biliris, R. Rastogi, and A. Silberschatz. A Low-Cost Storage Server for Movie on Demand Databases. In Proceedings of the 20th International Conference on Very Large Databases, pages 594{605, September 1994. [11] S. Ramanathan and P. V. Rangan. Architectures for Personalized Multimedia. IEEE Multimedia, 1(1):37{46, Spring, 1994. [12] P. V. Rangan, H. M. Vin, and S. Ramanathan. Designing A Multi-User Multimedia On- Demand Service. IEEE Communications Magazine, 30(7):56{65, July 1992. [13] L. A. Rowe, K. D. Patel, B. C. Smith, and K. Liu. MPEG Video in Software: Pepresentation, Transmission, and Playback. In Proc. IS&T/SPIE Symposium on High-Speed Networking and Multimedia Computing, pages 134{144. SPIE, February 1994. [14] H. M. Vin and P. V. Rangan. Designing a Multi-User HDTV Storage Server. IEEE Journal on Selected Areas in Communication, 11(1):15{16, January 1993. [15] G. K. Wallace. The JPEG still picture compression standard. Communications of the ACM, 34(4):30{44, April 1991. 16