A Video Broadcasting System - PDF Free Download

A Video Broadcasting System Simon Sheu (sheu@cs.nthu.edu.tw) Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30013, R.O.C. Wallapak Tavanapong (tavanapo@cs.iastate.edu) Department of Computer Science, Iowa State University, Ames, IA 50011-1040, U.S.A. Kien A. Hua (kienhua@cs.ucf.edu) Computer Science Program, School of EECS, University of Central Florida, Orlando, FL 32816-2362, U.S.A. Abstract. Recent years have seen intensive investigations of Periodic Broadcast, an attractive paradigm for broadcasting popular videos. In this paradigm, the server simply broadcasts segments of a popular video periodically on a number of communication channels. A large number of clients can be served simultaneously by tuning into these channels to receive segments of the requested video. A playback can begin as soon as a client can access the first segment. Periodic Broadcast guarantees a small maximum service delay regardless of the number of concurrent clients. Existing periodic broadcast techniques are typically evaluated through analytical assessment. While these results are good performance indicators, they cannot demonstrate subtle implementation difficulty that can prohibit these techniques from practical deployment. In this paper, we present the design and implementation of a video broadcasting system based on our periodic broadcast scheme called Striping Broadcast. Our experience with the system confirms that the system offers a low service delay close to its analytical guaranteed delay while requiring small storage space and low download bandwidth at a client. Keywords: Video-on-Demand, Video Streaming, Periodic Broadcast, Multicast. 1. Introduction Recent years have witnessed the use of digital videos and audio in several important applications such as distance learning, digital libraries, Movieon-Demand, and electronic commerce. Periodic Broadcast is an attractive delivery paradigm that has been intensively investigated in recent years. In this paradigm, the server strategically partitions each video into a number of logical segments, and periodically broadcasts these segments over the server channels reserved for the video. A server channel is defined as a unit of server capacity (i.e., the minimum of the network I/O bandwidth and the disk I/O bandwidth) required to support a continuous delivery of video data. The client tunes into one or more channels broadcasting the different segments of the video at proper times. The downloaded segments are temporarily stored in the client disk buffer until they are played out. The playback can begin as soon as the client receives the first few frames of the first segment. While playing c 2003 Kluwer Academic Publishers. Printed in the Netherlands. main.tex; 6/11/2003; 16:06; p.1

2 back the video frames in its buffer, the client typically switches channels to download subsequent segments. Periodic Broadcast guarantees a maximum service delay of no more than a fixed time period, say θ seconds, since the first segment of the video is broadcast every θ seconds. Several periodic broadcast schemes have been introduced. The effectiveness of these schemes is demonstrated via analytical assessment. However, some schemes are too expensive or very difficult to implement in practice. This is because they require the client to download data from many different network channels and write the data to various disk locations concurrently, or they require the client to have high download bandwidth or large storage space. A periodic broadcast scheme can be implemented by associating a multicast address with each server channel. Once the client obtains the multicast addresses for the desired broadcast video, the client then joins and leaves the multicast groups to download the different video segments without the need to contact the server directly during the downloading of the data. In this paper, we present the design and implementation of Striping Broadcast. We refined the original design of Striping Broadcast [20] that was shown to outperform Skyscraper Broadcasting [10] via analytical assessment. The desirable properties of Striping Broadcast are summarized as follows. Theoretically, Striping Broadcast ensures a jitter-free playback. With small client buffer space, Striping Broadcast guarantees a low maximum service delay. The degree of multiplexing is also low since the client downloads data from at most three channels concurrently. The client receiving bandwidth is at most three times the playback rate of the video. We implemented the prototype broadcasting system based on the design. Our objective in this paper is to share our implementation experience and to validate the analytical performance of Striping Broadcast with the experimental data that include playback quality perceived by users. The current literature has little information regarding implementation details of periodic broadcast schemes and the validation of the properties of the schemes. Our experience with the prototype confirms that the system offers a low service delay close to its analytical guaranteed delay while requiring storage space close to its theoretical bound. The video broadcasting prototype consists of the server and the client software. Both software run on Microsoft Windows platforms in an IP multicast network. Since IP multicast is used to facilitate the broadcast, we use the terms broadcast and multicast interchangeably hereafter. Both the server and the client software are written in Microsoft Visual C++. The client program uses Microsoft DirectShow to handle all decoding and rendering of video and audio data. Currently, the system supports MPEG-1 videos, but it can be extended to support other video formats supported by Microsoft DirectShow such as MPEG-4. The remainder of the paper is organized as follows. We provide background on existing periodic broadcast schemes and Striping Broadcast in main.tex; 6/11/2003; 16:06; p.2

Section 2 and present the design of the server software in Section 3. The client software is described in Section 4. Our experience with the system and experimental results are reported in Section 5. Finally, we give our concluding remarks and discuss future works in Section 6. 3 2. Background on Periodic Broadcast Existing periodic broadcast schemes can be classified into two major categories, namely the Server-Oriented category and the Client-Oriented category. Techniques in the first category reduce service delays by increasing server bandwidth whereas those in the second category minimize the delays by requiring more server and client bandwidth. 2.1. SERVER-ORIENTED CATEGORY Staggered Broadcasting [3] is the earliest and simplest video broadcasting technique. This scheme staggers starting times for broadcasting a video evenly across available channels. The difference in the starting times is referred as the phase offset. The advantage of Staggered Broadcasting is that the clients download data at the video playback rate and do not need extra storage space to buffer the incoming data. This scheme, however, scales only linearly with the increase in the server bandwidth. Pyramid Broadcasting [24] addresses this drawback by broadcasting video segments at a very high data rate, and allowing the clients to prefetch data into their local buffer. In this scheme, video segments are of geometrically increasing sizes; each segment is broadcast periodically on a server channel of equal bandwidth. This solution requires expensive client machines with enough bandwidth to cope with the high data rate on each broadcast channel. Permutation-Based Broadcasting [1] improves this condition by dividing each channel into s sub-channels that broadcast a replica of the video fragment with a uniform phase delay. This strategy reduces the requirement on client bandwidth by some factor s although the data rate remains very high, which can still flood the prefetch buffer with half of the total data [10]. In Skyscraper Broadcasting [10], the server bandwidth is divided into several logical channels. Each channel has bandwidth equal to the playback rate of the video. Each video is fragmented into several segments, and the sizes of the segments are determined using the following series referred to as the broadcast series: [1, 2, 2, 5, 5, 12, 12, 25, 25,...]. In other words, if the size of the first data segment is x, the size of the second and third segments are 2x, the fourth and fifth are 5x, sixth and seventh are 12x, and so forth. The server repeatedly broadcasts each segment on its dedicated channel at the playback rate of the video. To download the video, each client requires the communication bandwidth of at most twice the main.tex; 6/11/2003; 16:06; p.3

4 playback rate of the video. The requirement of the client buffer storage is also reduced. Fast Broadcast [12], Dynamic Skyscraper Broadcast [4], and Mayan Temple Broadcast [17] employ different broadcast series. Client-Centric Approach [9] and Greedy Disk-conserving Bandwidth [6] utilize extra client network bandwidth to further reduce the access latency. The two schemes employ different broadcast series. In all the schemes in this category, once clients resources have been determined, the service latency can be reduced by adding only server resources. Support for VCR-like interactions such as fast-forward and fast-reverse for periodic broadcast schemes in this category has been investigated [5, 22]. Active Buffer Management (ABM) [5] extends Staggered Broadcasting while the broadcast-based interaction technique [22] extends Client-Centric Approach. Broadcasting a variable bit rate videos can be supported by mapping it into a constant bit rate stream using the peak bandwidth of the video as the bit rate of the stream or a better mapping approach. To handle packet loss, Forward Error Correction has been used with Periodic Broadcast [13]. The effectiveness of these schemes are evaluated analytically or via simulations. 2.2. CLIENT-ORIENTED CATEGORY The techniques in this category require an increase in both server and client network bandwidth to reduce service delays since these techniques demand client bandwidth equal to server bandwidth for broadcasting a video. Harmonic Broadcasting (HB) [11] is the first technique in this category. HB fragments a video into segments of equal sizes, and periodically broadcasts each segment on a dedicate channel. The channels, however, have decreasing bandwidths following the Harmonic series. In other words, the first channel is allocated bandwidth equal to the playback rate of the video; the second channel has the bandwidth of half of the playback rate; the third channel has one third, and so forth. The client downloads segments from all the channels concurrently. The original HB, however, cannot deliver all the segments on time [18]. A simple delay equal to the size of one segment solves this problem. Caution Harmonic Broadcasting [24], Quasi-Harmonic and Poly- Harmonic Broadcasting [15] also address this problem. Although these schemes use many channels to broadcast a video, the total bandwidth grows slowly following the Harmonic series, typically adding up to only five or six times the playback rate of the video. Nevertheless, theses schemes requires the use of numerous channels for broadcasting long videos (e.g., 240 channels are required for a 2-hour video if the latency is kept under 30 seconds). Since the client must concurrently obtain video segments from many channels, a storage subsystem with the capability to move their read heads fast enough to multiplex among so many concurrent streams would be very expensive. To solve this problem, Pagoda Broadcasting and its variants [16, 14] use segmain.tex; 6/11/2003; 16:06; p.4

ments of equal sizes, but broadcast more than one segment on some channels of equal bandwidth. Optimally broadcasting scheme further optimizes the server bandwidth requirement for broadcasting a video [8]. Given n as the number of segments and S as the size of the video in terms of the size of the first segment, the optimal segment sizes follow a geometric series where the next segment is ( n S +1 1) times the current segment. Each segment is broadcast on a dedicated channel with equal bandwidth. Nevertheless, the client bandwidth must match that of the server bandwidth for broadcasting the video. VCR-interactions have also been investigated with a scheme in the client-oriented category [2]. 5 2.3. PERFORMANCE COMPARISONS The techniques in the client-oriented category have several drawbacks compared to those in the server-oriented category. First, the client must have network bandwidth equal to the server bandwidth allocated to the longest video to broadcast. The requirement on the client bandwidth is, therefore, very high, making the overall system very expensive. Second, to improve access latency, it will require adding bandwidth to both server and client, which makes the system enhancement very costly. The justification for the serveroriented approach is that server bandwidth, shared by a large community of users, contributes little to the overall cost. As a result, these techniques are less expensive than the client-oriented approach, which requires a client to be equipped with significant client bandwidth. Table I. Comparisons of analytical performance Technique Worst Delay Max. No. Storage Space Client Disk (min.) of Channels (% of video) BW (Mbps) Fast Broadcast [12] 0.23 9 0.50 15.00 Client Centric Broadcast [9] 0.40 4 0.33 6.00 GDB(4) [6] 0.40 3 0.23 6.00 Striping Broadcast 0.38 3 0.15 6.00 Skyscraper Broadcast [10] 0.85 2 0.36 4.50 Within the server-oriented category, Table I depicts an analytical performance comparison of Striping Broadcast presented in this paper compared with some existing periodic broadcast schemes for a broadcast of a 2-hour video with the average playback rate of 1.5 Mbps. The server bandwidth for broadcasting the video is limited to be no more than 15.0 Mbps. The performance metrics are the worst service delay, the maximum number of concurrent channels to download data, the client storage space (in percentage main.tex; 6/11/2003; 16:06; p.5

6 of the size of the video), and the client disk bandwidth. We observe that for a small worst delay no more than 0.40 min., Striping Broadcast uses the smallest buffer space, downloads data from at most three channels using the network bandwidth of at most three times the playback rate of the video, and requires low disk bandwidth. Skyscraper Broadcasting cannot support the worst delay smaller than 0.85 min. given the same server bandwidth. Given the promising performance of Striping Broadcast, we implemented our video broadcasting system based on Striping Broadcast and validate the theoretical performance with experimental data. 3. Server Software We describe the implementation of the server software in this section. The server software is responsible for advertising information regarding videos being broadcast and for broadcasting the videos. The program allows the user to easily insert and remove videos into/from the broadcast and to specify parameters to control a desirable maximum service delay through a graphic user interface (GUI) shown in Figure 1. Figure 1. Server software: main window. 3.1. SERVER ARCHITECTURE Figure 2 illustrates the high-level architecture of the server software. Coordinator accepts the user s commands via the interface and contacts other software modules according to the requested task. Currently, the video database refers to a collection of video files on a local file system of the server. Three important software modules are Data Retrieval Handler, Video Delivery Handler, and Directory Manager. The coordinator creates both handlers when a new video file is selected for broadcasting. The data retrieval handler is main.tex; 6/11/2003; 16:06; p.6

Multicast Networks 7 Directory Manager Coordinator GUI Administrator Video v Video Delivery Handler Streaming Buffers Data Retrieval Handler Video Database Figure 2. Server architecture. Disk Disk responsible for retrieving data blocks 1 from the disk into Streaming Buffers in memory. One streaming buffer is allocated per segment. The data in the buffer will later be multicast by the delivery handler. When the video is removed from the broadcast, the handlers of the corresponding video are also destroyed. Note that the two handlers and the streaming buffers are allocated per video instead of using the same set of handlers and buffers for all of the videos. This is to reduce the complexity of the server software when handling videos with different playback rates. Better server performance may be achieved with more complex disk and buffer scheduling for all broadcast videos. 3.1.1. Data Retrieval Handler The retrieval handler is implemented as a thread. The thread determines the size of each segment using a video playback rate p extracted from the MPEG system header of the video file and the following parameters from the user. K or the number of server channels allocated for the video; each channel has bandwidth equal to p Mbps. Let C 1,C 2,..., and C K be the K channels reserved for the video. Segment i is denoted by S i and is periodically broadcast on its own channel C i by the video delivery thread. N where 2 N K. N is used to control the size of the largest video segments, which affects the service delay and the amount of the required client buffer space. Let L be the time taken (in minutes) to play out the entire video at the playback rate. In other words, L is the playback duration of the video. Let L i be the size of segment i for i [1,K], which is measured in terms of the 1 A block is the smallest retrieval unit from disk; the size of each block is the same and is measured in terms of its playback duration in this paper. main.tex; 6/11/2003; 16:06; p.7

8 playback duration of the segment. The data retrieval thread calculates L i as follows. L i =1, (K N+2) 2 N 1 1 L i = 2 i 1 L 1 i [2,N 1], 2 N 1 L 1 i [N,K]. (1) When listing the sizes of all the segments in terms of the first segment, we have the following series of K elements, <1, 2, 4, 8, 16,, 2 N 1,, 2 N 1 >. This demonstrates that the sizes of the first N 1 segments increase geometrically, but the sizes of the last K N +1segments are kept the same. L 1 is also the maximum service delay. On average, the service delay is L 1 /2. According to Equation (1), the geometric increase enables the first segment to be small, resulting in a short maximum service delay. L 1 is smallest when N is the same as K, but the requirement on the buffer space increases in this case. This is because later segments become much larger. When they are downloaded ahead of their playback time, more buffer space is needed. To reduce the buffer space requirement, each of the largest segments is further divided into two equal-sized fragments or stripes as follows. We first define a superblock as consecutive blocks of which the combined size is equal to L 1. Each superblock of segment i where i [N,K] is equally divided into two parts. The first part is assigned to the first stripe of S i (S i1 ), and the second part belongs to the second stripe (S i2 ). The same process is repeated for the next superblock until all the superblocks in this segment have been considered. Figure 3 shows the logical segmentation for a video having 376 blocks when K =6and N =5. Each block in the figure is labeled with the block ID. The playback order is in the increasing order of the block IDs. The playback rate is assumed 1 minute per block. Since K or the number of channels is equal to 6, the video file is logically partitioned into six segments. Using Equation (1), the first segment consists of 8 blocks. Since N equals 5, each of the first four segments is twice as large as the segment preceding it. The sizes of the fifth and the sixth segments are limited to 2 4 L 1. Each of these largest segments is divided into two stripes as follows. For each superblock of S 5, the first half of the superblock is assigned to the first stripe (S 51 ), and the second half of the superblock is given to the second stripe (S 52 ). The same process is applied to S 6. The striping feature allows savings in client storage over the case without striping since one of the stripes is scheduled to arrive just in time for rendering. Hence, very small buffer space is needed for this stripe. The maximum theoretical required client storage in Mbits is [21] 60 p L 1 (3 2 N 3 1). (2) Once segments are determined, the data retrieval thread brings in multiple blocks of each segment from the disk into the streaming buffer of that segment main.tex; 6/11/2003; 16:06; p.8

9 2*L 1 4*L 1 8*L 1 16*L 1 16*L 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 151 160 168 176 184 192 200 208 216 224 232 240 248 7 15 23 31 39 47 55 63 71 79 87 95 103 111 119 127 135 143 151 159 167 175 183 191 199 207 215 223 231 239 247 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126 134 142 150 158 166 174 182 190 198 206 214 222 230 238 246 S 62 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117 125 133 141 149 157 165 173 181 189 197 205 213 221 229 237 245 L 1 4 12 20 28 36 44 52 60 68 76 84 92 100 108 116 124 132 140 148 156 164 172 180 188 196 204 212 220 228 236 244 3 11 19 27 35 43 51 59 67 75 83 91 99 107 115 123 131 139 147 155 163 171 179 187 195 203 211 219 227 235 243 2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 130 138 146 154 162 170 178 186 194 202 210 218 226 234 242 S 61 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 177 185 193 201 209 217 225 233 241 S 1 S 2 S 3 S 4 S 5 Figure 3. Logical segmentation of a video with N=5 and K=6. in rounds, starting from the beginning of the segment. Blocks are kept in the streaming buffer as long as the buffer is not full. The maximum size of each streaming buffer is set to be the minimum of the segment size and a predetermined value. This value is experimentally determined from a separate set of experiments by increasing the value until the server can broadcast the desired set of videos. This is to handle the case that the effective disk bandwidth of the system may be insufficient to support the broadcast of the desired videos. If the buffer is full, the retrieval thread sleeps until it is awakened by the delivery thread after some blocks are multicast and discarded from the streaming buffer. Some small segments are also kept in memory. Disk I/Os are not required for these segments for subsequent broadcasts. Since memory is relatively much less expensive than server communication bandwidth, we were not concerned with the available memory space. All streaming buffers for the video are deallocated when the video is removed from the broadcast or when the user exits the server program. 3.1.2. Video Delivery Handler The delivery handler is also implemented as a thread. The thread periodically multicasts the data in the buffers on the server channels using the associated multicast addresses. Each of the channels broadcasting the largest segment is further divided into two subchannels. Each subchannel has equal bandwidth of 0.5 p Mbps. The server repeatedly broadcasts S i on C i for i [1,N 1]. Each stripe of the largest segments is repeatedly broadcast on its own subchannel. That is, the first stripe or S i1 where i [N,K], is repeatedly broadcast on subchannel C i1. Similarly, each S i2 is repeatedly broadcast on subchannel C i2. Each segment (or stripe) is repeatedly broadcast on its channel (or subchannel), respectively. Since all segments need not be broadcast at the same time, the video delivery thread schedules the first broadcasts of different segments at different times. The demand on the client buffer space is minimized since each segment is stored in the client buffer for main.tex; 6/11/2003; 16:06; p.9

10 a short duration and is soon played out. The delay (phase delay) of the first broadcast of each segment is determined as follows. S 1 0 The phase delay of S i L S i1 = i 2 i [2,N 1], L i i [N,K], 2 S i2 L i i [N,K]. (3) The first segment is broadcast right away (i.e., phase delay is zero). The delay of the first broadcast of segment i where i [2,N 1] is half the size of the segment. Furthermore, all the first stripes of the largest segments are first broadcast with the same phase delay. All the second stripes are first broadcast at the same time, but are delayed twice the phase delay of the first stripe. The schedule for broadcasting the video segments of the previous example is depicted in Figure 4. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 The jth broadcast 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 Broadcast Index S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 S 1 C 1 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 S 2 C 2 S 3 S 3 S 3 S 3 S 3 S 3 C 3 S 4 S 4 S 4 C 4 S 51 S 51 S 52 C 51 C 52 S 61 S 61 S 62 C 61 C 62 phase delay broadcast data Figure 4. Broadcast schedule for the video file shown in Figure 3. To notify the client of the arrival times of the other segments relative to the arrival time of the corresponding first segment, the video delivery thread broadcasts Broadcast Tag before each broadcast of the first segment. Before downloading the first segment, the client receives the broadcast tag and uses it to derive the order of the download of other segments. Broadcast Tag is a N 1 bit representation ([b N 1 b N 2 b 1 ] 2 ) of the broadcast index associated with each broadcast of the first segment. The index value of the first broadcast is set to 2 N 2. For each subsequent broadcast, the index value is decremented by one. When the index value reaches zero, the value is reset to 2 N 1 1 for the next broadcast, and the same decrementing process continues. Suppose main.tex; 6/11/2003; 16:06; p.10

that the client starts receiving the first segment from the xth broadcast. The associated broadcast index is computed as follows. 11 Broadcast Index of the xth broadcast = { (2 N 2 x)(mod 2 N 1 ) for x 2 N 2 (2 N 2 x)(mod 2 N 1 )+2 N 1 otherwise. (4) For instance, given the broadcast schedule in Figure 4 when N equals 5 and K equals 6, the broadcast index associated with the 6th broadcast of the first segment is two ((2 5 2 6)( mod 2 5 1 )=2). The corresponding broadcast tag is 0010 2. The facts that the size of the segments is the power of two and the broadcast tag is generated this way are essential to prove that Striping Broadcast guarantees a jitter-free playback and the client downloads data from no more than three channels concurrently. Readers interested in the proofs are referred to Reference [21]. Before starting the broadcast, the video delivery thread creates a socket for each server channel and binds the socket with the multicast address and the port number associated with the channel. After that, the delivery thread executes setsockopt() function to set the time to live (TTL) value of a multicast packet to be 32. Setting the TTL value limits the multicast packets to be forwarded within the networks that are at most 32 hops away from the server since each router decrements the TTL value of a packet each time the router receives the packet. The router forwards the packet only if the TTL value of the packet is greater than the TTL scoping threshold set by the network administrator. Otherwise, the packet is discarded. The delivery thread does not need to join the multicast group in order to multicast the packets. To multicast the data on different server channels concurrently, the video delivery thread multicasts one block of each segment in a round-robin fashion for each playback duration of a block. For each stripe, the thread periodically multicasts one block at every other playback duration of a block. This is essentially similar to transmitting each segment at the playback rate and each stripe at half the playback rate. For instance, suppose there are S 3 and S 41 to be broadcast concurrently. The delivery thread multicasts the first blocks of S 3, and S 41, respectively. Right before the playback duration of a block elapses, the delivery thread multicasts the 2nd block of S 3. In the next period, the delivery thread multicasts the 3rd block of S 3 and the 2nd block of S 41. The same cycle repeats for the next blocks. The video delivery thread arranges each data block into a packet. That is, the payload of each UDP packet has a 4-byte header followed by the data in the block. The header of the packet contains the location of the block in the video file. The block location serves as both the timestamp and the sequence number for the client to reorder the data blocks for the correct playback. The delivery thread also keeps track of the number of late transmissions. When this number reaches a threshold, the delivery thread pauses the broadcast for the video and prompts a warning message that the server main.tex; 6/11/2003; 16:06; p.11

12 is encountering some problems. Since network conditions change and packet loss occurs in practice, dividing a block into a number of smaller packets and using error concealment techniques to estimate and reconstruct an approximation of the loss data can help minimizing the effect of packet loss. Alternatively, adding forward error correcting code can be used. Since existing error handling techniques are applicable, we do not focus on this issue in this prototype. However, we recommend the use of error handling techniques with periodic broadcast techniques in practice. 3.1.3. Directory Manager The directory manager maintains and broadcasts Video Directory that keeps information about all the videos being broadcast and the broadcast parameters. Each element of the directory has information about the broadcast of a particular video such as the video title, the multicast address of the channel used for broadcasting the first segment, broadcast parameters, and important characteristics of the video. Subsequent segments of the video are broadcast using subsequent multicast addresses from the address of the first channel. Prior to the broadcasts, socket creation and TTL setting must be done in a similar way the video delivery thread does. The video characteristics include information obtained from the MPEG sequence header of the video stream. This is to help the client software to setup its DirectShow encoding and rendering modules quickly. The structure of each directory element is depicted in Figure 5. The video directory is an array of these elements ended by a null byte. Like Session Announcement Protocol (SAP), the directory manager multicasts the video directory periodically (every one second in the current implementation) so that the client program can get the updated information when needed. We note that the video directory can be provided to the clients via other means such as Web servers or during an initial negotiation between the video server and the client. 4. Client Software The user wishing to watch a video runs the client program, JukeBox, and selects a multicast address to receive the video directory. The user selects a video to watch through a graphical user interface (see Figure 6). In the current version, we assume that the user knows the multicast address for broadcasting the video directory. For actual deployment, JukeBox can be extended to obtain this multicast address from the server. main.tex; 6/11/2003; 16:06; p.12

Multicast Networks 13 typedef struct Vdir tag { CString title; // video title DWORD size; // video size in bytes BYTE N; // parameters: N( 2) BYTE K; // parameters: K( N) IN ADDR home; // multicast address of the channel for the 1st segment WORD port; // port number of the first channel // Streams Specific DWORD mux; // playback rate BYTE ispts; // whether the stream has Presentation Time Stamp DWORD pts; // Presentation Time Stamp (last 32 bits) BYTE aid; // audio stream ID. BYTE atype[4]; // first 4 bytes of audio payload BYTE vid; // video stream ID. BYTE vlen; // used in vshape BYTE vshape[136]; // video resolutions } Vdir; Figure 5. Directory format for each broadcast video. Figure 6. Client software: JukeBox. Directory Explorer Coordinator GUI User Tag Prober Disk Buffer Manager Loader Loader Loader Video Player Figure 7. Client software: client architecture. Disk main.tex; 6/11/2003; 16:06; p.13

14 4.1. CLIENT ARCHITECTURE JukeBox is a multi-thread client with the high-level architecture illustrated in Figure 7. Coordinator accepts the commands from the user via the GUI and translates these commands into a set of actions for other modules. When the user asks for the updated video directory, the coordinator contacts Directory Explorer to get the list of broadcast videos and relevant broadcast parameters. The directory explorer asks the coordinator to pass along the video directory to have it presented to the user. To obtain the video directory, the directory explorer first creates a socket and binds the socket to the multicast address where the video directory is broadcast. The explorer joins the multicast group using the setsockopt(socketid, IPPROTO IP, IP ADD MEMBERSHIP, &mreq, sizeof(mreq)) function where mreq contains the multicast address. If successful, the explorer obtains the video directory using the select and recv function calls as in the case for unicast. After the user has selected the desired video, a series of tasks are performed by different modules in Figure 7 as follows. First, the video player is configured with the video-specific information from the corresponding element in the video directory. The directory element also specifies the multicast group broadcasting the first segment. The coordinator informs Tag Prober to obtain the broadcast tag as soon as possible from the multicast group of the first segment. The three Loaders are responsible for downloading video segments. One loader downloads the first segment right away while the tag prober translates the broadcast tag into the order for downloading the other segments. The playback can begin as soon as the beginning of the first segment is received. The remaining loaders download more segments according to the order indicated by the broadcast tag. Each loader first puts the downloaded data in its staging buffer to avoid socket buffer overflows (see Figure 7). As soon as the first block is downloaded, the playback can begin. Disk Buffer Manager takes the data from the three staging buffers and stores them in a temporary disk buffer. The data are later retrieved and pipelined to the video player. We discuss the details of each module as follows. 4.1.1. Tag Prober After getting the broadcast tag, the tag prober determines the order to download the segments from the broadcast tag. Since segment i is broadcast on channel i, the order of the download of the segments also determines the order of the channels that the client tunes into. Thus, we call this order a tuning order. Figure 8, illustrates how the tag prober determines the tuning order when the server broadcasts the segment as shown in Figure 4. The algorithm called DTS used by the tag prober to determine the tuning order is given in Figure 9. main.tex; 6/11/2003; 16:06; p.14

15 5 4 3 2 Channel ID 5 4 3 2 Segment ID 0 0 1 0 Broadcast Tag Initialization After scanning 1 bit After scanning 2 bits scan direction Tuning order (R) 3 2 Q 2 After scanning 3 bits 3 2 4 After scanning 4 bits 3 2 5 1 4 5 2 6 1 6 2 Figure 8. DTS when N=5 and K=6 and the broadcast tag is 0010 2. Suppose the tag prober obtains the broadcast tag of the 6th broadcast of the first segment. The corresponding broadcast tag is 0010 2. DTS checks the tag bit by bit from right to left. Since the first least significant bit is zero, the corresponding ID of the segment (i.e., 2) is entered into queue Q. Because the second least significant bit is 1, DTS appends the ID of the corresponding segment (i.e., 3) to the tuning order R, followed by the current content of Q. R now contains the partial list 3, 2, and Q is made empty. Since the third least significant bit is zero, 4 is appended to Q. The most significant bit is 0, indicating DTS to append the ID of the first stripe of segment 5 (denoted as 5 1 ) into R first. The current content of Q (i.e., 4) and the ID of the second stripe of segment 5 (denoted as 5 2 ) are appended, respectively. R now has the partial list 3, 2, 5 1,4,5 2. The tuning order for the subsequent stripes is simply 6 1 and 6 2. Thus, the final tuning order is 3, 2, 5 1,4,5 2,6 1,6 2. The tuning order is then used by the loaders to download the segments. The overhead for computing the tuning order is negligible since the algorithm is simple. It scans the broadcast tag only once. Using the DTS algorithm in Figure 9, we proved that the client does not need to download video segments from more than three channels concurrently [21]. This reduces the requirement of disk I/Os and network bandwidth of the client. 4.1.2. Loaders Three loaders are implemented as threads running the same routine. Recall that one loader is used to download the first segment right away. Once DTS determines the tuning order from the broadcast tag, each of the remaining loaders removes the first channel ID from the tuning order one by one, and tunes into the corresponding channel as soon as possible (i.e., join the corresponding multicast address). This enables the loader to successfully join the multicast group in time to download the segment. However, the loader may receive packets that belong to the previous broadcast of the segment. In this case, the loader ignores these extra packets by comparing the block ID in the main.tex; 6/11/2003; 16:06; p.15

16 ALGORITHM: Deterministic Tuning Schedule (DTS): INPUT: Broadcast Tag: [b N 1 b N 2 b 1 ] 2 Length of broadcast tag in bits: N 1 Number of video segments: K OUTPUT: Result queue: R LOCAL: Temporary queue: Q Working variables: i, j and k FOR i := 1 TO N-2 DO IF b i is zero THEN Append i +1to Q; ELSE Append i +1to R; Append Q to R; Empty Q; ENDIF ENDFOR IF b N 1 is zero THEN j=1; k=2; ELSE j=2; k =1ENDIF; Append N j to R; Append Q to R; Append N k to R; FOR i := N +1TO K DO Append i j to R; Append i k to R; ENDFOR RETURN(R); Figure 9. Algorithm for determining the tuning order. packet header with the range of the expected blocks of the segment. When the current download is finished, the loader becomes available and repeats the same process until the tuning order becomes empty. As the data loaders fill the buffers, the disk buffer manager fetches the data from disk and passes the data to the video player in the playback order. The video player uses DirectShow modules to decode and display the data. In Figure 10, we show main.tex; 6/11/2003; 16:06; p.16

the example when the loaders use the above tuning order (i.e., 3, 2, 5 1,4,5 2, 6 1,6 2 ) to download the video segments broadcast by the server in Figure 4. 17 Loader 1 2 S 51 18 S 61 Loader 2 S 3 10 S 52 Loader 3 1 S 2 6 S 4 26 S 62 Player S 2 S 3 S 4 S 51 0 1 3 7 15 Figure 10. Jitter-free playback at the client. S 52 31 S 62 S61 47 Playback Time Figure 10 shows the loader that downloads a segment, the arrival time, and the time to start the playback of the segment (termed playback time) since the 6th broadcast of the first segment. Each segment is entirely downloaded by one loader. The time unit on the x-axis is in terms of the playback duration of the first segment (L 1 ). The arrival time of each segment is labeled on the top-left corner of the segment. Although the size (in bytes) of each stripe is half the size of its segment, the stripe is also broadcast at half the playback rate. Thus, the time to download the entire stripe is still equal to the playback duration of the segment. The figure also shows that each segment arrives at the client before the playback time of the segment. For instance, S 4 arrives at the client at time 6, but the playback time of S 4 is at time 7. The facts that segments arrive before their playback times and the download rate is equal to the playback rate ensure that the playback is jitter free. In addition, at each time step, at most three segments are being downloaded concurrently. For instance, during times 10 and 14, three segments, S 4, S 51, S 52, are downloaded concurrently. To handle out-of-order UDP packets, the loader also allows blocks from the same segment to be received out of order. The loader does this detection by examining the block ID in the packet header. Only blocks in the expected range are inserted into the staging buffer. Packet loss is handled by Direct- Show during the playback of the video by skipping the lost data to the nearest decodable data. In practice, the time required to join a multicast group is not negligible. In the implementation, we made a slight change to the design as follows. For each broadcast of the first segment, the delivery thread attaches the broadcast tag of the next broadcast (instead of the current broadcast) to the beginning of the current broadcast. In other words, the tag of the next broadcast is multicast right before the data of the current broadcast. After receiving the tag and calculating the download order specified in the tag, the loaders join the multicast groups to receive the data as soon as possible. Since main.tex; 6/11/2003; 16:06; p.17

18 the loader is most likely to join the group successfully before the segment arrives, the loader ignores extra packets as out-of-order packets. However, if the segment arrives before the loader can successfully join the multicast group, a jitter occurs at the segment boundary. Fortunately, this condition happens rarely because each client has three loaders and the broadcast tag is multicast earlier, which gives the loaders some time to join the multicast groups. In the case that the broadcast tag itself is lost, the client waits for the subsequent tag, which results in a longer service delay. 4.1.3. Disk Buffer Manager The disk buffer manager manages the temporary buffer. Since video segments are received out of their playback order, the straightforward implementation is to create the complete image of the video in the local file system. When a block is received, it is stored in this file at the location indicated in the packet header. However, such implementation requires the buffer space as large as the entire video. We, on the other hand, strive to implement the disk buffer manager that does not use much more buffer space than the theoretical bound computed using Equation (2). A temporary file is created on disk as the client buffer, and deleted after the user closes the playback window. The buffer size depends on the video length, the playback bit rate, and the desired maximum service delay. Hence, broadcasting long videos of higher quality can still require a significant amount of the buffer space. Since it is still too costly to cache video data in the main memory for some environments (e.g., set-top boxes), we decided to maintain the buffer space on the disk to show that Striping Broadcast can be implemented in such environments. The disk buffer manager allocates the file larger than the theoretical bound by the size of the first segment (in bytes). The extra buffer space is used to conceal the effects of late arriving packets due to network congestion in practice. In other words, instead of playing the first block as soon as it arrives, the client uses the extra buffer space to store more blocks before initiating the playback. Hence, if the needed blocks arrive at most L 1 time unit late, the playback can still continue without jitters. Figure 11 illustrates how the disk buffer manager works. The video file, having 14 blocks, is multicast from the server with both N and K equal to 3. The playback rate is one block per minute. Each data block is denoted with a square labeled with its block ID. Using Equation (1), S 1 has two blocks, blocks 1 and 2. S 2 is twice as large S 1 ; thus, it has the next four blocks. The last segment S 3 has stripes: S 31 and S 32. The disk buffer of six slots (blocks), d 1, d 2,..., d 6, is created by the disk buffer manager since it uses the buffer larger than the theoretical buffer requirement (4 blocks using Equation (2)) by the size of the first segment (2 blocks) to cope with late arriving packets as mentioned previously. As shown in Figure 11(a), the last blocks of S 1 and S 2 are aligned with the last slot of the buffer. Blocks of S 1 and S 2 are stored in main.tex; 6/11/2003; 16:06; p.18

d 5 and d 6 at different times. S 31 and S 32 also use the slots previously used by the first two segments at a later time. Despite the sharing of the disk buffer, we show by example that there are no buffer overruns that are the situations when any two loaders store data in the same location concurrently. 19 1 2 0 1 relative playback time relative receiving time S 3 7 8 9 10 11 12 13 14 1 2 S 1 3 4 5 6 2 3 0 1 3 4 5 6 S 2 Case 1 7 9 11 13 S 3 1 first S 3 7 9 2 4 11 13 6 8 1 2 3 4 5 6 7 9 11 13 S 3 1 8 10 6 8 12 14 10 12 8 10 12 14 S 3 2 8 10 12 14 S 3 2 Case 2 7 9 11 13 S 3 1 S 1 Disk Buffer 1 2 3 4 5 6 1 2 3 4 5 6 S 2 (a) Regular segments 8 10 12 14 S 3 first 2 (b) Stripes Figure 11. Disk buffer management. Figure 11(a) illustrates the scenario when S 1 and S 2 are downloaded at the same time. The relative receiving time and the relative playback time of each block are shown on the top left corner of the data block. Blocks of regular segments are downloaded at the rate of one block per minute. Blocks of stripes are downloaded at half the playback rate, taking two minutes to download a block. Each block of the second stripe is consumed right after the completion of its download. The client will receive S 2, S 31, S 32 at 0, 2, 6 minutes, respectively since it starts downloading the first segment. At time 0, blocks 1 and 3 are downloaded into d 5 and d 3, respectively. At time 1, block 1 is consumed at the same rate, while blocks 2 and 4 are downloaded into d 6 and d 4, respectively. There is still one free block, d 5, for storing an incoming block. At time 2, block 2 in d 6 is consumed, and block 5 is downloaded into d 5. At time 3, block 3 is consumed, and block 6 is stored in d 6. Note that block 2 in d 6 has already been consumed. If S 2 is downloaded later after S 1, the disk buffer overruns are not possible since there will be more free slots available. Now, we show that buffer overruns cannot occur among stripes. S 31 and S 32 are received by 4 blocks apart according to the phase delay. Two scenarios are depicted in Figure 11(b). In Case 1, S 31 is received first; d 1, d 2, d 3, and d 5 are used for S 31, while d 4 and d 6 are repeatedly used for S 32. This case is similar to the case in Figure 11(a). Blocks 7 and 9 are stored in d 1 and d 2 that have not been used by any segments or stripes. Block 11 is stored in d 3 main.tex; 6/11/2003; 16:06; p.19

20 at time 6 or three minutes after block 3 occupying the same space has been rendered. Similarly, block 13 is stored in d 5 three minutes after block 5 in d 5 has already been rendered. Some blocks of the second stripe are stored at some disk slots used earlier by S 2. At time 6, block 8 is downloaded into d 4 in which block 4 has been rendered out since time 4. At time 8, block 10 is downloaded into d 6 in which block 6 has been rendered at time 6. Within S 32, blocks 8 and 12 can share d 4 since block 12 is stored in d 4 after block 8 has already been consumed. No buffer overruns occur. It can be shown similarly that there are no buffer overruns for Case 2 when S 32 is received before S 31. 4.1.4. Video Player Filter Graph Manager (FGM) Piper (MPEG-1 Stream Splitter) Video Player video audio input input MPEG Video Decoder MPEG Audio Decoder output output input input Video Renderer Audio Renderer Figure 12. Architecture of the video player. The video player uses Microsoft DirectShow modules to handle all decoding and rendering. The architecture of the player is shown in Figure 12. The core of DirectShow services consists of a collection of different types of filters connected in a configuration called a filter graph. Filters are Microsoft objects that perform some specific functions. DirectShow uses Filter Graph Manager (FGM) to coordinate the filters and to allocate the shared buffers between them to exchange media data. FGM also controls the data flow and synchronizes the filters such that time-stamped data samples are delivered at the right time. Applications control the filter graph through FGM. More information on the process of setting up the filter graph can be found in [23]. Existing DirectShow filters used in our implementation are as follows. MPEG Video Decoder decodes an MPEG-1 video bitstream and outputs the corresponding uncompressed video data. At least one MPEG sequence header 2 must be found in the bitstream. This is the reason that our video directory contains video/audio-specific information from the sequence header so that this decoder and the audio decoder can be set up quickly. MPEG Audio Decoder decodes MPEG-1 audio data from its input and outputs the corresponding uncompressed audio samples. 2 An MPEG sequence header contains the width and the height of the picture, picture rate defining the number of frames to be displayed per second, etc. main.tex; 6/11/2003; 16:06; p.20

21 Video Renderer displays uncompressed video data on the screen. Audio Renderer renders uncompressed audio samples to a sound device. The FGM provides a set of COM interfaces 3 for an application to access the filter graph. The important interface is IMediaControl that allows the application to issue commands to run, pause, and stop the filter graph. When the filter graph is running, all filters in the filter graph continuously transport data from one filter to the next so that the media data is rendered. When the filter graph is paused, the filters process data but do not render the data. In a stopped filter graph, the filters release resources and do not process any data. We implemented a new MPEG-1 stream splitter named Piper in our video player. Piper takes data from the disk buffer manager, parses the data, and separates them into audio and video streams according to the MPEG specification. Piper is more flexible than the DirectShow MPEG-1 stream splitter since Piper not only accepts an MPEG-1 system bitstream as the DirectShow MPEG-1 stream splitter does, but also accepts a MPEG-1 video bitstream (without the audio stream) and a MPEG-1 audio bitstream (without the video stream) from the disk buffer manager. Piper gets the information whether the bitstream has both audio and video or just either one of them from the video directory. Hence, the player can play MPEG-1 files with or without audio. Furthermore, error handling techniques suitable for visual frames or audio frames can be implemented in Piper in the future to correct errors before passing the data to appropriate decoding filters. When the video player receives commands from the coordinator in response to user inputs, the video player controls the FGM through the IMediaControl interface to start, stop, or pause the filter graph. The current prototype does not support VCR functions. Readers interested in the extension of Striping Broadcast to support these functions are referred to Reference [19]. To support other commercial video players in the future, the client software can be implemented as a middleware to reassemble video data from different channels and feed the data to other players in the playback order. 5. Experience We conducted several experiments to assess the performance of the broadcasting prototype on an IP multicast testbed shown in Figure 13. Our testbed consisted of two 100 Mbps Ethernet subnets interconnected through a Linux machine configured as a multicast router as follows. We recompiled the Linux kernel 2.4.2-2 to enable its multicast support and ran pimd routing daemon [7] 3 A specific memory structure containing an array of pointers to functions implemented by the component. main.tex; 6/11/2003; 16:06; p.21