Network. Decoder. Display - PDF Free Download

On the Design of a Low-Cost Video-on-Demand Storage System Banu Ozden Rajeev Rastogi Avi Silberschatz AT&T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974-0636 fozden, rastogi, avig@research.att.com Abstract Recent advances in storage technology, coupled with the dramatic increase in the bandwidth of networks, make it now possible to provide \video on demand" service to viewers. A video on demand server is a computer system that stores videos in compressed digital form and provides support for dierent portions of compressed video data to be accessed and transmitted concurrently. In this paper, we present a low-cost storage architecture for a video on demand server that relies principally on disks. The high bandwidths of disks in conjunction with a clever strategy for striping videos on them is utilized in order to enable simultaneous access and transmission of dierent portions of a video, separated by xed time intervals. We also present a wide range of schemes for implementing VCR-like functions including fast-forward, rewind and pause. Finally, we extend our schemes to the case when videos have dierent rate requirements. 1

1 Introduction The video on demand (VOD) concept has become exceedingly popular with telecommunications, computer and cable companies. Viewers that subscribe to a VOD service have access to a much wider feature set in comparison to the broadcast based cable and TV networks. For example, a viewer can start watching a video, from among a particular set of videos, at any time the viewer wishes to do so. When watching a video, a viewer can apply VCR operations like pause, resume, fastforward and rewind to the video. Thus, VOD services dier substantially from today's broadcast cable services in which, at any given time, all the viewers see the same portion of a video and viewers of videos have no control over its transmission. Also, unlike today's video stores, VOD services oer convenience (since viewers do not have to leave their homes) and typically result in lower response times. Until recently, low network bandwidths and video storage technologies made oering VOD services to viewers a dicult task. However, today, networks built using optic bers have bandwidths of several gigabits per second. Furthermore, not only is it now possible to store video data in digital form, but it is also possible to obtain high compression ratios. For example, display of video at 30 frames/sec that is compressed using the MPEG-1 [Gal91] compression algorithm requires a bandwidth of merely 1.5 Mbps. Thus, it is possible now to concurrently transmit independent video streams to thousands of viewers. Even though the problem of transmitting video data is considerably simplied due to the availability of high bandwidth networks, the design and implementation of VOD storage servers that are responsible for the storage and retrieval of dierent portions of videos simultaneously remains a non-trivial problem. A storage architecture for a VOD server must address the following issues: low-cost, continuous retrieval of videos, ability to support VCR operations, and servicing multiple viewers concurrently. In general, a VOD server will contain a cache to temporarily store the currently viewed videos. The currently viewed videos will be loaded onto the cache from a library which stores videos permanently (e.g., a jukebox of tapes). The cache can be designed with random access memory (RAM) as a at architecture. However, this approach will increase the cost of the VOD server substantially due to the high cost of RAM and the high storage requirements of videos. For example, an MPEG-1 compressed 100 minute video with an average bandwidth of 1.5 Mbps requires approximately 1.125 GB of storage. Assuming the cost of RAM is $50:00 per MB, the cost of a 2

RAM-based cache to store 100 videos will exceed $5:5 million. In this paper, we propose a storage hierarchy to design a low-cost cache for a VOD server. The hierarchy consists of disks which store the currently viewed videos, and a small amount of RAM buers which store only portions of the videos. Due to the low cost of disks (approximately 30 cents per MB), the cost of a VOD server based on our architecture is substantially less than the one in which the entire video is loaded into RAM. However, unlike a RAM-based architecture, access times to random locations on disks are relatively high. Therefore, clever storage allocation schemes must be devised to continuously retrieve dierent portions of a video for a large number of users and at the same time to minimize the buering requirements. For the same reasons, the implementation of VCR operations like fast-forward, rewind, pause and resume is a dicult task. We present a \phase-constrained" storage allocation scheme which enables a large number of dierent parts of a video to be viewed simultaneously, and a variety of schemes for implementing the VCR operations. The schemes illustrate the trade-o between the size of the RAM buers required and the quality of the VCR-type service, in particular, abruptness in display perceived by viewers during fast-forward/rewind operations as well as the response time for switching back to normal display mode from pause, fast forward and rewind modes. The lower costs of schemes that provide limited functionality for fast-forward, rewind and pause make them attractive for a wide range of environments. The remainder of the paper is organized as follows. In Section 2, we present anoverview of the system architecture that provides VOD services. We present our scheme for storing videos on disks in Section 3. In Section 4, we describe howvcr operations can be implemented in our architecture. Schemes for implementing ne granularity fast-forward and rewind are presented in Section 5. In Section 6, we present buering schemes that provide the same functionality as if videos were stored in RAM, butdonotrequire entire videos to be stored in RAM. We extend our scheme to handle multiple videos with varying rate requirements and multiple disks in Section 7. A comparison of our work with related work in the area can be found in Section 8. Concluding remarks are oered in Section 9. 2 Overall System Architecture In this section, we present an overview of the system architecture for supporting VOD services. The main system component, the VOD server, is a computer with one or more processors, and a 3

Movie Library MOD Server RAM Buffer Network Decoder Display Decoder Display Figure 1: System architecture for VOD servers. cache to hold a set of currently viewed videos in compressed form. The cache is updated from a library of videos at the same site or from a library or a cache at another site. The cache that stores the currently viewed videos can be designed as a at architecture consisting only of RAM. Due to the high cost of RAM, however, this approach makes the cost of a VOD server prohibitively expensive. Therefore, we propose a two-level cache architecture consisting primarily of secondary storage devices and a limited amount of RAM. The second level of the cache consists of disks, which store the currently viewed videos, while the rst level consists of the RAM buer to temporarily hold portions of videos currently being displayed. Due to the lower cost of disks, our approach yields cheaper VOD services. Figure 1 illustrates the overall architecture for VOD servers. The compressed data for video V i is transmitted at a rate of r i over a high bandwidth network individually to every viewer that requests the video. The number of viewers serviced by a single VOD server would vary depending on the geographical location. However, we expect this number to be between 5,000 and 10,000. Every viewer has a decoder, which consumes the compressed data for video V i from a local buer at a rate of about r i and outputs frames to a display at the playback rate (which istypically 30 frames/sec). Viewers can issue commands to control the display of a video that is stored in the VOD server. These commands include begin, fast-forward, rewind, pause and resume. The commands are transmitted to the VOD server, which maintains information relating to the status of every viewer (e.g., 4

the last command executed by the viewer, the position in the video of the last bit transmitted to the viewer). While a video is being displayed, viewers can apply any of the above commands to control the display of the video. We refer to the transmission of a video starting at a given time as a phase. Two phases may correspond to the same or dierent videos. To simplify our discussion, we will initially present our results assuming that only one video with rate r d is being handled by the VOD server. In addition, we assume that the server has a single disk with bandwidth r t. In Section 7, we show how our results can be generalized to multiple videos with varying rates and multiple disks. The maximum number of concurrent phases that can be supported by retrieving video data from the disk, denoted by p, isgiven by: p = b r t r d c: (1) The reason for this is that each phase requires video data to be retrieved from disk at a rate r d. Even if the server can support the maximum number of phases, this number is not, in general, sucient to provide each viewer with an independent phase, since the number of viewers will typically be larger than the maximum number of phases. The challenge is to devise clever storage algorithms and buering techniques to assign viewers to the right phases at the right times in order to provide on-demand video service with VCR functionalities. Since the server can support a limited number of concurrent phases that are shared between several viewers, it follows that in order to provide low average response times when viewers request a video, and to provide good implementations of VCR operations, the following two properties should be preserved: 1. The VOD server must support the maximum number of phases. 2. The phases must be uniformly distributed across the entire video. Thus, in a VOD server that preserves the above two properties, if the length of the video is l seconds, then the dierence between any two concurrent phases is l. p Our goal is to devise storage allocation schemes that support the maximum number of concurrent phases, each of which is l apart. Furthermore, in order to keep the cost of the system low, the p storage allocation scheme must not require large amounts of video data to be buered in RAM. 5

3 The Basic Storage Architecture In this section, we propose a storage architecture that enables a VOD server to support the maximum number of concurrent phases with xed phase dierences, and requires only a small amount of buer space to be maintained per phase. Before presenting our scheme, we rst show that adopting the naive approach of storing the video data contiguously on disk requires large amounts of video data to be buered in RAM, which implies that we need to develop better allocation schemes. For every phase, the video data from disk is retrieved into a RAM buer of size b bits at a rate r d. In order to ensure that data for the p phases can be continually retrieved from disk at a rate r d, in the time that the b bits from p buers are consumed at a rate r d, the b bits of the video following the b bits consumed must be retrieved into the buers. Since each retrieval involves positioning the disk head at the desired location and then transferring the b bits from the disk to the buer, we have the following equation. ( b r t + t lat ) p b where t lat is the disk latency. Hence, the size b of the buer per phase can be calculated as r d b t lat r d r t ( rt p ; r d) : (2) Thus, the buer size per phase increases both with latency of the disk and the number of concurrent phases. In the following example, we compute for a commercially available disk the sizes of portions of videos that need to be buered in order to support the maximum number of concurrent phases. Example 1: Consider a commercially available disk costing $3000 with a capacity of 9 GB, a transfer rate of 80 Mbps and a disk latency of 15 milliseconds. Suppose that MPEG-1 is used, which implies that r d is 1.5 Mbps. Thus, the maximum number of concurrent phases that can be supported at a rate of 1.5 Mbps using the device is b 80 c =53=p. Using Equation 2, we compute 1:5 the buer size requirements b to support 53 concurrent phasestobeb 190 Mb. Since we require 53 dierent buers, the total storage requirements are 53 190 10 Gb, which is larger than the size of a 111 minute video. 2 3.1 Phase-Constrained Allocation In order to keep the amount of buer per phase low, we propose a new storage allocation scheme for a video on disk, which we call the phase-constrained allocation scheme. The phase-constrained al- 6

location scheme lays out video data on disk in an interleaved fashion such that retrieving the video data sequentially from disk enables portions at xed intervals in the video to be retrieved concurrently. Thus, the phase-constrained scheme eliminates seeks to random locations, and thereby enables the concurrent retrieval of the maximum number of phases p, while maintaining the buer size per phase as a constant independent of the number of phases and disk latencies. Note however, that the time dierence between any two adjacent phases is xed. Let l be the length of a video in seconds. Thus, the storage occupied by the video is l r d bits. Suppose video data is read from disks in portions of size d. We shall assume that l r d is a multiple of p d: 1 Our goal is to be able to support p concurrent phases of the video. In order to do this, we chop the video into p contiguous partitions. Thus, the video data can be visualized as a (p 1) vector, the concatenation of whose rows is the video itself and each rowcontains t c r d bits of video data, where t c = l p : For the disk in Example 1, and a 100 minute video, t c = 6000 53 =113:21 seconds. We refer to t c as the smallest phase dierence since the rst bit in any two adjacentrows are t c seconds apart in the video. Since video data in each row is retrieved in portions of size d, arow can be further viewed as consisting of n portions of size d, where n = t c r d d Thus, a video can be represented as a (p n) matrix of portions as shown in Figure 2. Each portion in the matrix can be uniquely identied by the row and column to which it belongs. Suppose we now store the video matrix on disk sequentially in column-major form. Thus, as shown in Figure 3, Column 1 is stored rst, followed by Column 2, and nally Column n. We show that by sequentially reading from disk, the video data in the p rows can be retrieved concurrently. Furthermore, the video data in each row can be retrieved at a rate r d. Retrieving an entire column takes pd r t units of time. This is because columns are stored and retrieved sequentially one after another. From Equation 1, it follows that: p d r t d r d (3) Therefore, in the time required to consume d bits of the video at a rate r d, an entire column can be retrieved from disk. As a result, while a portion in each row is being consumed at a rate r d, the next portion in each of the rows can be retrieved. 1 The length of the video can be modied by appending advertisements, etc. to the end of the video. 7

1 2 3 n 1 2 3 p............... d Figure 2: The video viewed as a matrix. 1st column 2nd column nth column (1,1) (2,1) (p,1) (1,2) (2,2) (p,2) (1,n) (2,n) (p,n) d............ Figure 3: Placement of n columns of video matrix. If we assume that once the n th column has been retrieved, the disk head can be repositioned to the start of the device almost instantaneously, then we can show that p concurrent phases can be supported, the phase dierence between any two phases being a multiple of t c. The reason for this is that the time to retrieve data for the n columns sequentially from disk is npd r t which isless than or equal to nd r d = t c (due to Equation 3). As a result, every t c seconds the disk head can be repositioned to the start. Thus, a new phase can be initiated every t c seconds. Furthermore, for every other concurrent phase, the last portion retrieved just before the disk head is repositioned, belongs to Column n. Since we assume that repositioning time is negligible, Column 1 can be retrieved immediately after Column n. Thus, since the portion following portion (i n) in Column n, is portion (i +1 1) in Column 1, data for concurrent phases can be retrieved from disk at a rate r d. In Section 3.3, we present schemes that take into account repositioning time when retrieving data for p concurrent phases. 3.2 Buering We now compute the buering requirements for our storage scheme. With every row of the video matrix, we associate a video buer, into which consecutive portions in the row are retrieved. Each of the video buers is implemented as a circular buer that is, while writing into the buer, if the 8

end is reached, then further bits are written at the beginning of the video buer (similarly, while reading, if the end is reached, then subsequent bits are read from the beginning of the buer). With the above circular storage scheme, every tc n seconds, consecutive columns of video data are retrieved from disk into video buers. The size of each buer is 2d, one half of which isusedto read in a portion of the video from disk, while d bits of the video are transmitted to viewers from the other half. Also, the number of video buers is p to store the p dierent portions of the video contained in a single column { the rst portion in a column is read into the rst video buer, the second portion into the second video buer and so on. Thus, in the scheme, initially, the p portions of the video in the rst column are read into the rst d bits of each of the corresponding video buers. Following this, the next p portions in the second column are read into the latter d bits of each of the corresponding video buers. Concurrently, the rst d bits from each of the video buers can be transmitted to viewers. Once the portions from the second column have been retrieved, the portions from the third column are retrieved into the rst d bits of the video buers and so on. Since consecutive portions of a video are retrieved every tc n seconds, consecutive portions of the video are retrieved into the buer at a rate of r d : Thus, in the rst video buer, the rst n portions of the video (from the rst row) are output at a rate of r d, while in the second, the next n portions (from the second row) are output and so on. Thus, data for p concurrent phases of the video can be retrieved by sequentially accessing the contents of consecutive video buers. 3.3 Repositioning The storage technique we have presented thus far enables data to be retrieved continuously at a rate of r d under the assumption that once the n th column of the video is retrieved from disk, the disk head can be repositioned at the start almost instantaneously. However, in reality, this assumption does not hold. Below, we present techniques for retrieving data for p concurrent phases of the video if we were to relax this assumption. The basic problem is to retrieve data from the device at a rate of r d in light of the fact that no data can be transferred while the head is being repositioned at the start. A simple solution to this problem is to maintain another disk which stores the video exactly as stored by the rst disk and which takes over the function of the disk whose head is being repositioned. An alternate scheme that does not require the entire video to be duplicated on both disks can be employed if the minimum phase dierence t c is at least twice the repositioning time. The video data matrix is divided into two submatrices so that one submatrix contains the rst d n 2 e columns 9

and the other submatrix, the remaining b n c columns of the original matrix, and each submatrix 2 is stored in column-major form on two disks with bandwidth r t : The rst submatrix is retrieved from the rst disk, and then the second submatrix is read from the other disk while the rst disk is repositioned. When the end of the data on the second disk is reached, the data is read from the rst disk and the second disk is repositioned. If the time it takes to reposition the disk to the start is low, in comparison to the time taken to read the entire video, as is the case for disks, then almost at any given instant one of the disks would be idle. To remedy this deciency, in the following, we present a scheme that is more suitable for disks. In the scheme, we eliminate the additional disk by storing, for some m, the last m portions of the column-major form representation of the video in RAM so that after the rst lr d ; md portions have been retrieved from the disk into the video buers, repositioning of the head to the start is initiated. Furthermore, while the device is being repositioned, the last m portions of the video are retrieved into the video buers from RAM instead of the device. Once the head is repositioned and the last m portions have been retrieved into the video buers, the columns are once again loaded into the video buers from disk beginning with the rst column as described earlier in the section. For the above scheme to retrieve data for phases of the video continuously at a rate of r d we need the time to reposition the head to be less than or equal to the time to consume m portions of the video at a rate of r d that is, m d r d t lat where t lat is the maximum disk latency. Thus, the total RAM required is md +2dp. The cost of retrieving data for p concurrent phases of the video into the video buers using the disk in Example 1 and our storage allocation scheme can be computed as follows. Since the unit of retrieval from disks is a sector, whose size is typically 512 bytes, we can choose the portion size d to be any multiple of 512 bytes. We choose d to be 4096 bytes, a common page size in operating systems. Since the maximum number of concurrent phases for the disk is 53, the RAM required for the video buers is 4096 53 2 = 434 KB: Since the cost of the disk is $3000, if we use the additional disk to make up for repositioning time, the total storage cost for the system per video would be approximately $6021. On the other hand, if we use the latter scheme that uses RAM, due to the low value of t lat, the cost of RAM is negligible. Thus, the storage cost of the system per video would be $3021 as opposed to $62 500 if the entire video were stored in RAM. 10

4 Implementation of VCR Operations We now turn our attention to how VCR operations can be implemented in our basic architecture. We assume that videos are digitized and compressed using the widely used MPEG-1 video compression algorithm [Gal91]. However, our scheme for the implementation of VCR operations is general and can be used even if dierent compression algorithms, transfer and playback rates are employed. The MPEG-1 video compression algorithm requires compressed video data to be retrieved at a rate of about r d =1:5 Mbps in order to support the display of moving pictures at a rate of 30 frames per second. MPEG-1 compressed video is a sequence of Intraframe (I), Predicted (P) and Bidirectional (B) frames. I-frames are stand-alone frames and can be decoded independently of other frames. P-frames are coded with reference to the previous frame and thus can be decoded only if the previous frame is available, while a B-frame requires the closest I/P-frame preceding and following the B-frame for decoding. I-frames consume the most bandwidth, while B-frames consume the least (the ratio of the bandwidths consumed by the frames is 5:3:1). We refer to a sequence of frames beginning with an I-frame and ending with a P-frame as an independent sequence of frames. Thus, since an independent sequence of frames contains references for every B-frame in it, it can be decoded by an MPEG-1 decoder. The organization of frames in MPEG-1 is quite exible, the frequency of I-frames being a parameter to the MPEG-1 encoder. We shall assume that in MPEG-1 compressed videos stored on the VOD server, there are 2k + 1 BBP frames between any two consecutive I-frames, where k is a positive integer (see Figure 4). In addition to I, B and P frames, there is a special instance of a P-frame, which is a constant frame and which we refer to as a Repeat (R) frame, with the following property: when an MPEG-1 decoder receives an R-frame immediately after it receives a P-frame or an R-frame, it outputs the same frame as the previous one output by it. Since the consumption rate of the MPEG-1 decoder may not be uniform (it could exceed or fall below 1.5 Mbps), a process at the viewer site continuously monitors the decoder buer, discarding BBP frames immediately preceding an I-frame if the buer overows and inserting additional R- frames between P and I-frames in case the buer underows 2. We now describe how the control operations begin, pause, fast-forward, rewind and resume for a video are executed with our basic storage architecture. As we described earlier, contiguous portions 2 We do not expect deletion and insertion of a few additional frames to seriously eect the quality of the video since each frame is displayed for only 1 30 th of a second. 11

... I BBP BBP BBP I Figure 4: A possible sequence of MPEG-1 frames. of the video are retrieved into p video buers at a rate r d. The rst n portions are retrieved into the rst video buer, the next n into the second video buer, and so on. begin: The transmission of compressed video data to the viewer starts once the rst video buer contains the rst frame of the video. Portions of size d are transmitted to the user at a rate r d from the video buer (wrapping around if necessary). After the i n th portion is transmitted, transmission of video data is resumed from the i +1 th video buer. We refer to the video buer that outputs the video data currently being transmitted to the viewer as the current video buer. Since in the worst case, n d bits may need to be transmitted before the rst video buer contains the rst frame of the video, the delay involved in the transmission of a video when a viewer issues a begin command, in the worst case, is the minimum phase dierence t c. pause: Once a P-frame immediately preceding an I-frame is transmitted, subsequent frames transmitted to the viewer are R-frames. fast-forward: Beginning with the current video buer, the following steps are executed. 1. Continue transmitting compressed video data normally until a P-frame is transmitted from the current video buer and the next video buer contains an I-frame. 2. Transmit video data beginning with the I-frame in the next video buer. 3. Go to Step 1. Thus, during fast-forward, independent sequences of frames are transmitted, the number of bits skipped between any two successive sequences being approximately n d. rewind: This operation is implemented in a similar fashion to the fast-forward operation except that instead of jumping ahead to the following video buer, jumps during transmission are made to the preceding video buer. Thus, beginning with the current video buer, the following steps are executed. 12

1. Continue transmitting compressed video data normally until a P-frame is transmitted from the current video buer and the previous video buer contains an I-frame. 2. Transmit video data beginning with the I-frame in the previous video buer. 3. Go to Step 1. resume: In case the previously issued command was either fast forward or rewind, bits are continued to be transmitted normally from the current video buer. If, however, the previous command was pause, then once the current video buer contains the I-frame following the last P-frame transmitted, normal transmission of video data from the video buer is resumed beginning with the I-frame. Thus, in the worst case, similar to the case of the begin operation, a viewer may experience a delayoft c seconds before transmission can be resumed after a pause operation. Furthermore, the basic architecture enables the viewer to jump to any location in the video in t c seconds. During fast-forward and rewind, since independent sequences of frames are transmitted, the MPEG-1 decoder has no problems decoding transmitted data. Also, when switching from one video buer to another, one of the video buers must contain an I-frame, while the other must contain a P-frame. However, this is not really a problem, since due to the high frequency of P- frames in the compressed video, it is very likely that every time a video buer contains an I-frame, adjacent video buers would contain P-frames. Finally, in the extreme case, 30t c frames may be skipped for every IBBP sequence transmitted. Thus, fast-forward and rewind could give the eect that the frames are displayed at approximately 7:5t c times their normal rate. We shall refer to the number of frames skipped during fast-forward and rewind as their granularity. For the disk in Example 1, t c for a 100 minute video is approximately 113 seconds. Thus, the worst case delay is 113 seconds when beginning or resuming the display of a video. Furthermore, the number of frames skipped when fast-forwarding and rewinding is 3390 (113 seconds of the video). By reducing the minimum phase dierence t c, we could provide better quality VOD service to viewers. We now show howmultiple disks can be employed to reduce t c. Returning to Example 1, suppose that instead of using a single disk, we were to use an array of 5 disks. In this case, the bandwidth of the disk array increases ve-fold from 80 Mbps to 400 Mbps. The number of phases, p, increases from 53 to b 400 1:5 c = 266, and, therefore, the minimum phase dierence t c reduces from 113 seconds to 6000 266 or approximately 22 seconds. In this system, the worst case delay is 22 seconds and the number of frames skipped is 660 (22 seconds of the video). The storage cost of the system 13

would increase ve-fold, from $3021 to $15105 which is still less than the cost of storing the entire video in RAM (i.e., $62500). Although the basic service may be sucient for many viewers, there may be viewers who are willing to pay more for higher quality VOD service. Ideally, during fast-forward and rewind, we would like the 2k BBP frames between consecutive IBBP frames to be skipped (typically, thevalue of k ranges between 2and5). In the following sections, we individually address the following two issues. 1. Reduction of the granularity of fast-forward and rewind. 2. Elimination of the delay in resuming normal display after a pause operation. 5 Improving Granularity of Fast-Forward and Rewind The granularity of fast-forward and rewind operations presented in Section 4 is dependent on the phase dierence t c. There are two possible approaches to reducing the number of bits skipped between two successive independent sequences of frames during fast-forward and rewind. We elaborate on both approaches in the following subsections. 5.1 Storing a Fast-Forward Version of the Video A separate version of the video that is used to perform fast-forward and rewind operations is stored. Since we assume that there are 2k BBP sequences between any two consecutive IBBP sequences in the video, the fast-forward (FF) version is obtained from the compressed MPEG-1 video by omitting the 2k BBP sequences in between two consecutive IBBP sequences. Thus, the FF-version of the video contains only consecutive IBBP sequences of frames and thus, transmitting it to viewers at a rate of r d would result in an eect that is similar to one of playing the video in fast-forward mode 3. The storage required for the FF-version of the video can be shown to be 1 k+1 times the storage required for the video. Since the bandwidth requirements for I, P and B are in the ratio 5:3:1, assuming that a B-frame consumes a unit of storage, it follows that P and I frames consume 3 and 5 units of storage, respectively. Thus, since there are 2k + 1 consecutive BBP frame sequences between any two consecutive I-frames, and each BBP sequence consumes 5 units of storage, it can be shown that for every 10 + 10k units of the video (I frame followed by 2k + 1 BBP frames), the 3 Note that since only IBBP sequences are transmitted, it is possible that the rate at which the decoder consumes bits would increase beyond rd: However, the process at the viewer site can insert R-frames to ensure that the buer never underows. 14

FF-version of it contains only 10 frames (I frame followed by a BBP frame sequence). Thus, it follows that the FF-version of the video consumes 1 k+1 times the storage consumed by the video. 5.1.1 Storing the Fast-Forward Version in RAM One simple option is to store the entire FF-version of the video in RAM. This is more cost-eective than the RAM-based architecture in which the entire video is stored in RAM since the FF-version of the video occupies only 1 k+1 times the storage occupied by the video. The operations fast-forward, rewind, and resume require the transmission of bits to switch between the video buers and the FF-version of the video. The various operations are implemented as follows (pause and resumption from pause mode is implemented as described in the previous section). fast-forward: Frames are continued to be transmitted from the current video buer until a P-frame is transmitted. Once a P-frame is transmitted, the rst I-frame that follows the P-frame in the video is located in the FF-version of the video. Frames are continued to be transmitted at a rate of r d from the FF-version beginning with the I-frame. rewind: In this case, frames are transmitted from the current video buer until a P-frame is transmitted. Once a P-frame is transmitted, the rst I-frame that precedes the P-frame in the video is located in the FF-version of the video. The following steps are executed. 1. Transmit frames from the FF-version of the video beginning with the I-frame until a P-frame is transmitted. 2. Once the P-frame is transmitted, locate the I-frame in the FF-version that belongs to the IBBP sequence that immediately precedes the IBBP sequence just transmitted. 3. Go to Step 1. resume: Resumption of normal display from fast-forward and rewind is handled in a similar fashion to resumption from pause. Once a P-frame is transmitted from the FF-version, until one of the video buers contains the rst I-frame in the video following the P-frame, R- frames are transmitted to the viewer. Normal transmission is resumed from the video buer beginning with the I-frame. 15

5.1.2 Storing the Fast-Forward Version on Disk An alternative to storing the FF-version of the video in RAM is to store it on disk using the phase-constrained allocation scheme described in Section 3 as we did for the video itself. Thus, in addition to video buers in which consecutive portions of a video are retrieved, an additional set of buers into which consecutive portions of the FF-version of the video are retrieved is maintained. We refer to these as FF-buers. The minimum phase dierence for the stored FF-version of the video is t ff which is approximately 1 k+1 times smaller than the minimum phase dierence t c for the video. The number of portions, n ff of size d in a row of the FF-version is t ffr d d. Example 2: Consider a 100 minute video compressed using MPEG-1 and for which k = 2 (thus, there are 5 BBP sequences between any two consecutive I frames). The video consumes 1.125 GB of storage, while its FF-version requires 1:125 3 = 375 MB of storage space. Also, if the FF-version of the video were to be stored on the disk described in Example 1 using the phaseconstrained scheme, then t ff would be approximately 113 3 =37seconds. 2 There are three possible ways in which the fast-forward operation can be implemented. The simplest way of implementing the fast forward operation is to rst determine the FF-buer containing frames closest to and following the frame being currently transmitted from the current video buer. After a P-frame is transmitted from the current video buer, frames are transmitted from the FF-buer beginning with an I-frame. The number of bits between portions contained concurrently in any two consecutive FF-buers is n ff d bits in the FF-version of the video. Thus, in the worst case, switching from a video buer to a FF-buer could result in approximately n d bits (t c seconds) of the video being skipped. However, once transmission is switched from the video buer to the FF-buer, 2k BBP sequences are skipped between any two consecutive IBBP sequences. A second approach is to continue transmitting from the current video buer until a P-frame immediately preceding an I-frame is transmitted and then simply transmit R-frames until an FFbuer contains the I-frame that follows the P-frame in the video. In the worst case, this could result in the viewer's display being frozen for about t ff seconds since the required I-frame may have just been transmitted from the FF-buer and waiting for it to be retrieved into the FF-buer again may result in a delay of t ff seconds. It would be desirable if we could avoid freezing the viewer's display aswell as skipping n d bits 16

of the video when switching from the video buer to an FF-buer. A third solution that achieves both is to simply continue transmitting frames normally from the video buer until one of the FF-buers contains the rst I-frame in the video that follows the last P-frame transmitted. At this time, transmission of subsequent frames is carried out from the FF-buers beginning with the I-frame. In this case, however, in the worst case, the delay involved in switching from the video buer to the FF-buer could be greater than t ff seconds since the rst I-frame following the last transmitted P-frame may have just been transmitted out of the FF-buer. The I-frame is retrieved into the FF-buer again after t ff seconds. However, in that time, few more frames may have been transmitted to the viewer from the video buer. As a result, in the worst case, before transmission of bits from the FF-buer can begin, it may need to output n ff d bits plus the number of bits of the FF-version of the video output by the viewer buer before transmission switches to the FF-buer. Thus, in order to compute the delay, t d,involved in switching from the video buer to the FF-buer, in the worst case, we use the following equation. n ff d + r d t d (k +1) r dt d In the above equation, r d t d on the right hand side of the equation is the number of bits output from the FF-buer in time t d. The rst term on the left hand side is the number of bits that the FF-buer must output before it outputs the same bit twice, and the second term is the number of additional bits it needs to output in order to catch up with the viewer buer (the viewer buer outputs r d t d bits of video in time t d or t dr d bits of the FF-version of the video since the FF-version k+1 of the video is 1 k+1 times the size of the video). Thus, in the worst case, the delay is t d n ff d (k +1) k r d : For the video in Example 2, t d is approximately 56:5 seconds. Thus, unlike the case in which the FF-version of the video is loaded into RAM in which a smooth transition to the fast-forward mode is possible without delay, in the disk based solution, the transition to fast-forward mode is either abrupt or could, in the worst case, result in a delay of at least t ff seconds. The problem with storing an FF-version of the video on a disk is that the implementation of the rewind command is not possible using the FF-buers. The reason for this is that successive IBBP sequences of frames are retrieved into the FF-buers, while for rewind, once an IBBP sequence of frames is transmitted, the previous IBBP sequence of frames needs to be transmitted. Thus, for the purpose of supporting rewind, we store a dierent version of the video, which we refer to 17

as the REW-version of the video, on the disk. The REW-version, like the FF-version, contains only IBBP sequences of frames except that the order of appearance of the IBBP sequences in the FF-version and the REW-version are reversed. Also, a separate set of buers is maintained into which consecutive portions of the REW-version of the video are retrieved, which we refer to as REW-buers. The minimum phase dierence for the REW-version of the video is also t ff seconds. The alternatives described for the implementation of fast forward can also be used to implement rewind by switching to the REW-buer instead of the FF-buer. In addition, in the rst alternative, the REW-buer containing frames closest to and preceding the frame being currently transmitted from the current video buer is determined. In the second and third alternatives, instead of transmitting the rst I-frame in the video that follows the last P-frame transmitted from the FFbuer, the rst I-frame in the video that precedes the P-frame is transmitted from the REW-buer. The number of bits skipped by the rst alternative and the worst case delay in the second alternative are as described for fast-forward. However, for the third alternative, the worst case time delay, t d, can be shown to be t d n ff d (k +1) (k +2) r d : The reason for this is that, initially, there must exist a REW-buer that, before outputting a maximum of n ff d bits of the REW-version of the video, will output the I-frame required for switching transmission from the video buer to the REW-buer. However, since frames are continually transmitted to the viewer from the video buer, in the worst case, only the dierence between n ff d and 1 k+1 times the number of bits transmitted from the video buer in time t d are output from the REW-buer in time t d (the viewer buer outputs r d t d bits of video in time t d or t dr d k+1 REW-version of the video since the REW-version of the video is 1 k+1 n ff d ; r d t d (k +1) r d t d bits of the times the size of the video). For the video in Example 2, t d is approximately 28 seconds. Thus, the worst-case switching time for REW is about half of that for FF. The resume operation is implemented similar to the case in which the entire FF-version was stored in RAM and may require the display to be frozen for t c seconds. 5.2 Buer Based Solution The schemes for implementing ne granularity fast-forward and rewind described in the previous subsection either require the entire FF-version of the video to be stored in RAM or resulted in 18

Movie (1,1) (1,2) (2,1) (3,1)...... (p,1) (2,2) (3,2)...... (p,2).......... (1,n) (2,n) (3,n)...... (p,n).. Column 1 Column 2 Column n current movie buffer Movie Buffers Viewer Buffer k+1 k+1 Current Viewer Buffer (k+2) Viewer Buffer Figure 5: Viewer Buers for Fast-forward and Rewind an abruptness or delay in switching to fast-forward/rewind mode. In this subsection, we present a scheme for supporting ne-granularity fast-forward and rewind that does not require the entire FF-version of the video to be stored (in RAM or on disk) and results in a smooth transition to fast-forward and rewind mode. The scheme is especially suitable in case a few number of viewers are watching the video. Informally, the basic idea underlying the scheme is that, at all times, if we were to buer nd bits following, and n d bits of the FF-version of the video preceding the current bit being transmitted, then it is possible to support both ne-granularity fast forward and rewind without any delays. It is necessary to buer bits since video buers output the video at a rate of r d and during fastforward/rewind, the FF-version of the video, and not the video itself needs to be transmitted at r d : The reason that it suces to buer n d bits of FF-version of the video following the current bit being transmitted is that when a viewer issues the fast-forward command, in the time that the buered n d bits of the FF-version of the video are transmitted, n d bits are output by each video buer. Thus, the n d +1 th bit of FF-version of the video would be output by a video buer and by buering it, its availability for transmission once n d bits are transmitted can be ensured. Using a similar argument, it can be shown that buering n d bits preceding the current bit being transmitted suces to support continuous rewind. Note that buering x < n d bits of the FF-version of the video preceding or following the current bit being transmitted, could result in hiccups due to the unavailability of bits during fast-forward/rewind. The reason for this is that 19

once x bits of FF-version of the video have been transmitted, the x +1 th bit may not be available since x<n d and thus, none of the video buers may have output it while the x bits were being transmitted. In order to buer the required bits of the FF-version of the video, 2k+4 viewer buers are maintained per viewer watching the video (see Figure 5). A viewer buer is used to store the FF-version of the video output by a video buer and has a size nd;jibbp j k+1 + jibbpj which isthe maximum number of bits of FF-version, that a video buer can output (jibbpj is the storage required for an IBBP sequence). The buers are arranged in a circular fashion and each buer is a circular buer. One viewer buer stores the FF-version of the video output from the current video buer. k + 1 viewer buers following the buer are used to store n d bits of the FF-version of the video output from the k +1 video buers following the current video buer, while k +1 viewer buers preceding the buer are used to store n d bits of the FF-version of the video output from the k + 1 video buers preceding the current video buer. The remaining viewer buer is used to load the FF-version of the video in case the viewer issues a fast-forward or rewind command. With every viewer buer are associated variables start buf and end buf. start buf stores the oset from the start of the buer of the bit in the viewer buer with the lowest position in the video. Variable end buf stores the oset from the start of the buer, of the last bit contained in the buer. An additional variable cur buf is used to store the current viewer buer. During normal display and during pause, cur buf is the viewer buer containing the FF-version of the video output by the current video buer, and during fast forward and rewind mode, cur buf is the viewer buer from which bits are transmitted. The FF-version of a video is loaded into a viewer buer from a single video buer. Consecutive bits output from the video buer and belonging to only IBBP sequences are simply copied into the viewer buer beginning from the start of the buer. When a bit whose position in the video is the smallest among all the bits (belonging to IBBP sequences and) output by the video buer is copied into the viewer buer, start buf for the buer is set equal to the oset of the bit from the beginning of the viewer buer. Bits are continued to be copied into the viewer buer until it contains all the bits output by the video buer that belong to IBBP sequences. end buf for the buer is set to the oset of the last bit from the start of the buer. During fast-forward and rewind, the FF-version of the video is transmitted from the viewer buers at a rate of r d : While transmitting data from a viewer buer, if end buf is reached and start buf for the buer is 0, then subsequent bits are transmitted beginning with start buf in the 20

next viewer buer. If, on the other hand, start buf for the buer is not 0, then subsequent bits are retrieved from the start of the buer. Once one or more bits have been retrieved from a buer, if the next bit to be retrieved from the buer is at oset start buf from the beginning of the buer, then subsequent bits are retrieved from start buf in the next buer. Traversing the viewer buers in the reverse direction (during rewind) is carried out as follows. If the beginning of the buer is reached and start buf is not 0, then subsequent bits are traversed beginning with end buf. On the other hand, if the oset of the current bit from the start of the buer is start buf, then subsequent bits are accessed from the previous viewer buer, beginning with end buf, ifstart buf for the buer is 0, and start buf-1, otherwise. The various operations are implemented as follows. begin: 2k +4 viewer buers are allocated for the viewer and cur buf is set to one of them (that is arbitrarily chosen). cur buf and the k viewer buers following it are loaded with the FF-version of the video from the rst k +1 video buers. Once the k +1 viewer buers are loaded, and the rst video buer contains the rst frame of the video, video data is transmitted to the viewer from the rst video buer and concurrently the k +1 th viewer buer following cur buf is loaded from the k +2 nd video buer. During normal transmission of bits to the viewer, when transmission switches from the current video buer to the next, cur buf is set to the next viewer buer and the k +1 th viewer buer following cur buf is begun to be loaded from the k +1 th video buer following the current video buer. Furthermore, during normal display, loading of viewer buers is restricted to only cur buf, the k + 1 viewer buers following cur buf and the k+1 viewer buers preceding cur buf. The maximum latency to start viewing the video is less than 2 t c : fast-forward: Once a P-frame immediately preceding an I-frame is transmitted from the video buer, loading of the k +2 nd viewer buer following cur buf is initiated from the k +2 nd video buer following the current video buer. Concurrently, the I-frame following the P- frame is located in cur buf and subsequent bits are transmitted from cur buf beginning with the I-frame. During fast-forward, every time transmission of bits switches from a viewer buer to the next buer (cur buf is set to the next buer), the following steps are performed. 21

1. The loading of the k +2 nd viewer buer preceding cur buf from the k +2 nd video buer preceding the current video buer is terminated. 2. The k +2 nd viewer buer following cur buf is loaded from the k +2 nd video buer following the current video buer. rewind: Once a P-frame belonging to an IBBP sequence is transmitted from the video buer, loading of the k +2 nd viewer buer preceding cur buf is initiated from the k +2 nd video buer preceding the current video buer. Concurrently, the I-frame belonging to the sequence is located in cur buf and sequences of IBBP frames are transmitted at a rate of r d in the reverse order of their occurrence in the viewer buers. During rewind, once every bit from a viewer buer has been transmitted and transmission switches to the previous viewer buer (cur buf is set to the previous buer), the following steps are performed. 1. The loading of the k +2 nd viewer buer following cur buf from the k +2 nd video buer following the current video buer is terminated. 2. The k +2 nd viewer buer preceding cur buf is loaded from the k +2 nd video buer preceding the current video buer. pause: In this case, bits are transmitted normally from the video buers until a P-frame preceding an I-frame is transmitted. Once the P-frame is transmitted, subsequent frames transmitted to the viewer are R-frames (loading of viewer buers is continued as before the transmission of R-frames). resume: In case the previous command was a pause, once the video buer contains the I- frame following the last P-frame transmitted, transmission of video data to viewers is resumed at a rate of r d : Also, all viewer buers that were being loaded during the pause operation, are continued to be loaded. In case the previous command was rewind or fast-forward, bits are transmitted from the viewer buers until a P-frame is transmitted. Once the P-frame is transmitted, until a video buer contains the I-frame following the P-frame, subsequent frames transmitted to the viewer are R-frames. During the transmission of R-frames, loading of viewer buers is restricted to the k + 1 buers following and the k + 1 buers preceding cur buf. Once a video buer contains the I-frame, normal transmission is resumed from the video buer beginning with the I-frame. 22