A GoP Based FEC Technique for Packet Based Video Streaming

A Go ased FEC Technique for acket ased Video treaming YUFE YUA 1, RUCE COCKUR 1, THOMA KORA 2, and MRAL MADAL 1,2 1 Dept of Electrical and Computer Engg, University of Alberta, Edmonton, CAADA 2 nstitut für Telekommunikations systeme, Technche Universität erlin, erlin, GERMAY Abstract: - n th paper, we propose an efficient forward error correction (FEC) technique for video transmsion in a lossy network. Here, the FEC applied on source at group of pictures level assuming an MEG-like compression scheme. We also derive analytically an estimate of the playable frame rate for the proposed technique. t shown, by both analys and simulation, that the proposed FEC technique provides a better playable frame rate than the classical frame-level FEC techniques. Key-Words:- Video treaming, Forward Error Correction, MEG, Group of ictures, layable Frame Rate 1 ntroduction Video traffic through the nternet has increased significantly in recent years. Typical video applications include news broadcasts, video clips, music televion, and video conferencing. However, the nternet has limited bandwidth, and excessive traffic may lead to congestion at times. Videos are generally transmitted through nternet using. When congestion occurs in the network, some video are likely to be lost. everal techniques have been suggested to solve the packet loss problem. The forward error correction (FEC) scheme [1] has been shown to be an effective way to combat packet loss during video streaming. n FEC, redundant are transmitted along with source. f the number of lost smaller than the number of redundant, the video data can be reconstructed without error. n th paper, we consider developing an efficient FEC-based transmsion mechanm. everal FEC-based techniques have been proposed in the literature. Mayer-atel et al. [2] presented an analytical FEC model for the MEG frame structure that uses three types of frames (,, and ). Wu et al. [3] extended th model and derived analytically the playable frame rate (FR) for a given packet loss probability. However, these techniques assume that the FEC coding rate allocated among the different picture types. Th allocation strategy not necessarily the best strategy for packet-based FEC in MEG framework. n th paper, we propose a FEC technique for video streaming. Here, the FEC applied at the group of pictures (Go) level instead of being applied only at the frame level. The average playable frame rate (FR) for the proposed FEC technique derived analytically. t shown, by both analys and simulation, that the proposed FEC technique provides a better FR than the frame-level FEC technique. The paper organized as follows. ection 2 presents a brief overview of the current FEC techniques. The proposed analytical model then derived in ection 3. erformance of the proposed technique evaluated in section 4, which followed by the conclusions in ection 5. 2 Review of ackground Work n th section, we present a brief review of the background work on FEC based video streaming. When video are sent through a lossy channel, some are likely to be lost. Th packet loss generally modeled as ernoulli trials. When K source are transmitted with (-K) redundant, the probability of successful transmsion given by [4] r r (, K, p) = (1 p) p (1) r= K r where p the packet loss probability. Current video coding standards such as MEG uses the so called hybrid coding where redundancy in the frames first removed by motion compensation. Further redundancy reduction then obtained using block based dcrete cosine transform. ote that in MEG video coding, there are three types of frames (,, and ) as shown in Fig. 1. Wu et al. [3] have recently proposed an analytical model (henceforth referred to as the frame-level FEC technique) for optimizing FEC-based transmsion in the Go based MEG framework. The FEC are

generated based on individual frames (,, or ) as shown in Fig. 2. 1 2 3 4 5 6 7 8 9 10 1 1 2 1 3 4 2 5 6 2 Time Fig. 1: The structure of a Go and the inter-frame dependency relationship within it. f each Go includes one -frame, frames and frames, the effective Go transmsion rate given by T / s pkt G = (2) ( F ) + ( F ) + ( F ) where T the transmsion rate, s pkt the packet size, size of frames (in ), (in ), size of frames size of frames (in ) F the number of FEC added to each frame F the number of FEC added to each frame F the number of FEC added to each frame. frame: frame: frame: F F F Last frame: F Fig. 2: Arrangement of source and FEC ackets in framelevel FEC technique. The total FR by Wu et al. s technique given by + 1 + 1 q q q q R= Gq. 1 + +. q + qq (3) 1 q 1 q where q, q, and q are the probabilities of successful transmsion of an,, or frame, respectively. The probabilities q, q, and q can be expressed as follows: q = ( F, p) q = ( F, p) q = ( F, p) calculated using Eq. (1). where (.,.,.) 3 roposed FEC Technique The frame-based FEC technique provides a good error resiliency performance with appropriate selection of parameters such as F F and F [3]. A problem with th approach that the allocation of FEC for each type of frames static. n th paper, we propose a FEC model where we first select, depending on the network condition and the Go structure, an appropriate number of FEC for a Go. The FEC are then generated for the entire Go and added to the original source. The number of redundant added such that the playable frame rate maximized. 3.1 The roposed Model The organization of frames in a typical Go looks like the following: 0,0... 0, 1 1...,0..., 1 1...,0... m m m m+, 1 ote that the number of frames, and the number of frames between two successive reference frames. The number of -frames in a Go given by = ( 1+ ). Therefore, the total number of source given by K = + + ( + 1) where, and are the size of, and frames (in ), respectively. These K source are arranged as shown in Fig. 3. ote that the frames have been arranged as per their predictive behavior. We now add (-K) FEC, resulting in a total of (source and FEC). We propose to use a class of linear erasure codes [5] known as systematic codes. For systematic ( nk, ) codes, the k n generator matrix includes the identity matrix ( k k) as a sub-matrix. As a result, the FEC coded data include the source data. Th will provide two advantages. When the number of lost less than or equal to (-K), the entire Go can be recovered. Even when the number of lost greater than (-K), the Go can be partially recovered. The advantage of th model over frame-based model explained by an example below. Assume that a Go has 72 source, which includes 24 source from the frame. Further assume that the number of FEC 20, and in the frame-based technique 6 out of these 20 (4)

correspond to the -frame. The total number of (source + FEC) therefore 92. During the transmsion, let us assume that 15 are lost. The Go based technique can easily reconstruct the entire Go. However, the performance of the frame-based technique will depend on the frames related to the lost packet. f 10 out of these 15 lost belong to the frame, the frame cannot be reconstructed, and the entire Go virtually lost. frame: frame: frame: frame: Last frame: FEC ackets K ource ackets (-K) FEC ackets Fig. 3: Arrangement of source and FEC ackets in the proposed technique. t possible to come up with a counter example where the frame-based technique will perform better than the Go based technique. Therefore, we derive an analytical formula, in section 3.2, for the playable frame rate (FR) of the Go based model. Th can then be compared with the FR of the frame-based model to determine the effectiveness of the proposed model. 3.2 FR for Go-based FEC n order to calculate the overall FR, we calculate the decoding probabilities of, and frames, which are denoted by q, q, and q, respectively. The calculation of these probabilities explained below. n order to calculate q (the probability of successfully delivering an -frame), we classify the delivery of into three situations based on the number of lost L. a) The -frame decodable when L K. b) The -frame decodable with certain probability when K < L. c) The -frame not decodable when L >. ote that in case (a), L smaller than or equal to the number of redundant. Therefore, th case fully protected with (, K ) systematic codes, and we should not experience any decoding error. n case (b) L exceeds the number of redundant, and we will have decoding errors. However, if L smaller than ( ), there a possibility that all lost belong to from and frames (or redundant ). n case (c), too many have been lost, and therefore frame not decodable. Combining all three situations, the probability that - frame playable, can be expressed as K s r r s r q = p (1 p) + (1 p) p (1 p) r r = K (,, p) + (1 p) ( s, K+ 1,1 p) r= 0 r= K+ 1 s s s r The playable rate of frames can be expressed as R = Gq. (5) ow consider a frame. The mth frame, m playable if it s preceding and frames, and itself are successfully transmitted. Assuming that the previous reference frames are available, we have the following three situations with respect to the number of lost L. a) The m -frame decodable when L K. b) The m -frame probably decodable with certain probability when K < L m. c) The m -frame not decodable when L > m. Therefore, the play rate of the mth -frame ( m ) given by K r r q = (1 ) m p p r= 0 r s ms s + ms s ms r s ms r + (1 p) p (1 p) r= K+ 1 r s + ms = K (,, p) + (1 p) ( s ms, K+ 1,1 p) The playable rate of frames can be expressed as m m= 1 R = Gq. = G. q (6) For all frames except those after the last -frame, we have the following three situations with respect to the number of lost L. a) The i, j-frame decodable when L K. b) The i, j-frame decodable with certain probability when K < L s ( i+ 1) s s. c) The -frame not decodable when i, j L > s ( i+ 1) s s. The probability of successful decoding of these -frames given by [6]

s+ ( i+ 1) s+ s q = (1 p) ( s ( 1), 1,1 ) i, j i+ s s K+ p (7) + K (,, p) For the frames after the last -frame (i.e., those preceding the -frame of the next Go), the successful decoding possible only if both the -frame (in the current Go) and the -frame of the next Go are successfully decoded. Therefore, the probability of successful decoding of these -frames given by [6] s + s+ s (1 p) ( s s s, K + 1,1 p) q = q (8), j + K (,, p) Combining (7) and (8), the playable rate of frames can be expressed as R = Gq. = G. q (9) i,0 i= 0 The total playable frame rate expressed by R = R + R + R = G( q + q + q) (10) where R, R, and R are calculated using Eqs. (5), (6) and (9), respectively. The complexity of the packet generation depends on the FEC codes used. n th paper, we assume that the erasure codes are systematic codes, and hence only the redundant need to be generated. t can be shown that [17] for a typical compression framework, the complexity of the proposed technique about 5-7 times that of Wu s technique [3]. However, since the generation of redundant makes up only a small part in the computation of the streaming applications, we do not expect a significant impact on the overall computational complexity of the codec by replacing a frame-based FEC technique with a Go-based FEC technique. 4 erformance Evaluation n th section, the closed form formula derived in section 3 compared with that of the frame level FEC model in [3]. The FR computed from using the two models will be compared in section 4.1. n section 4.2, we will use a nonscalable MEG-4 trace and the -2 network simulator [7] to conduct FEC simulations for video streaming. 4.1 Model-based Analys We calculate the FR of the proposed technique and compare it that of [3]. The FR in Eqs. (3) and (10) provides the rate in frames/sec. For simplicity, we can also express the FR as a ratio (in %) as follows. FR FR Ratio (in %) = 100 (11) ource Frame Rate The FR in Eq. (11) provides the percentage of the frames in a Go that can be decoded correctly at the receiver. The source frame rate typically varies between 15 and 30 depending on the applications. n the simulation, we assume a rate of 25 frames/sec to calculate the FR ratio. The network settings, such as packet size ( s pkt ), the round-trip time ( t RTT ), TC retransmit timeout value ( t RTO ) are taken from typical network connections. We assume the UD as the transport protocol. However, in order to avoid network congestion, we assume that the UD transmsion TC-friendly. We use the following formula to calculate the transmsion rate in the network [4] for a given packet loss probability. spkt T = (12) 2p 27p 2 trtt + trto p( 1+ 32p ) 3 8 The bitrate of a streamed video highly variable, and can range from 64 Kbps to 10 Mbps. n th analys, the bitrate set at 1.15 Mbps (MEG-1 VCD quality). A Go assumed to have 12 frames with = 3, and = 2. The parameter values used in th analys are t RTT =50 ms, t RTO =200 ms, s pkt =500 or 1000 bytes, and the network loss probability, p = [0.005,0.01,0.02,0.03, 0.04,0.05,0.06,0.07, 0.08,0.09, 0.10]. ote that a bitrate of 1.15 Mbps will result in approximately 300 and 150 /sec for packet size 500 and 1000 bytes, respectively. Fig. 4 shows the FR of the frame-based as well as Go based techniques with no FEC. The plot was generated using Eqs. (3), (10) and (11). ince there no FEC in the streaming, the performance of the frame-based FEC and the Go-based FEC identical. t observed that the FR deteriorates very quickly. We can see that only around 40% of the frames can be delivered error free at p=0.02, which a fairly small packet loss probability. Fig. 4 also shows the effect of packet size on the FR. n order to keep a constant channel-coding rate, the number of was doubled when s pkt =500. t observed that the streaming performance with s pkt =1000 significantly better. Th mainly because, with the same packet loss probability, the playable frame rate stattically better (see Eq. (3)) when the number of small. When s pkt =500, we use more, and therefore, the FR drops. Fig. 5 shows the improvement of FR when FEC added. The symbol ( F F and F ) corresponds to the number of FEC for -, -, and -frames for

the frame-based FEC. n other words, the total number of redundant (for one Go) given by ( 1) + + + F F F n order to compare the frame and Go based techniques, we add identical number of redundant to a Go source. was done analytically with fixed model parameters. However, in practice, network conditions and frame sizes vary stattically at various temporal scales. To obtain a more realtic performance comparon, we have evaluated the performance on -2 network simulator. nstead of using a fixed mean compressed frame size to compute the FR, we used the real downloadable trace files of videos generated by an MEG-4 encoder [6]. The streaming server reads entries from a trace file, generates source, and passes the source to the UD agent for transmsion. ince the trace file represents a variable bitrate compressed video bitstream, the client contains a receiver buffer to smooth out the bitstream variations. The decoder then periodically accesses the receiver buffer to retrieve for decoding. f all for a frame and its reference frame(s) are received, the frame labeled playable; otherwe, the frame declared unplayable. Fig. 4: Comparon of FR ratio with no FEC for parameters t RTT =50 ms, and t RTO =200 ms. Fig. 5(a) shows the FR after we provide a light weight FEC (1,1,0) whereas Fig. 5(b) shows the FR when moderate are imposed. t observed that the proposed technique provides a significant FR improvement over the frame-based FEC technique. t has been found that the proposed technique provides a performance similar to frame-based technique at high FEC, and has not been shown in the figure. Although, the FEC in general improves the FR in a lossy network, a heavy weight FEC need not necessarily perform better than a light weight FEC. A close look at Fig. 5(a) and 5(b) will reveal that at p=0.02, FEC (1,1,0) provides a better performance than FEC (4,2,0). Th primarily because of the FEC overhead. f the FEC provided exceeds an appropriate level, it occupies unnecessary extra bandwidth that could have been used to transmit source. t has been shown by experiments that for marginally lossy network, a light weight FEC provides the best performance whereas for moderate lossy network, a medium weight FEC provides the best performance. Finally, for the highly lossy network, a heavy weight FEC provides the best performance. n all three situations, the proposed Go based FEC provides a superior performance compared to the frame-based FEC technique. 4.2 imulation-based Analys n section 4.1, we have compared the performance of the Go-based and frame-based techniques. The comparon (a) (1,1,0) (b) (4,2,0) Fig. 5: Comparon of FR ratio with parameters ( t RTT =50 ms, t RTO =200 ms). ote that the number of FEC in (a), (b) are respectively, 4, and 10.

n our simulation, a 10 minutes clip was streamed out of the movie Die Hard. The movie clip was encoded at medium quality using an MEG-4 encoder. ndependent packet loss events during a streaming session were assumed throughout our simulations. The simulation results with different FEC configurations are illustrated in Fig. 6. For every value of p, we used ten different seed values for the random number generator to generate different loss patterns. n Fig. 6, the mean FR values are plotted for each p. To show the effectiveness of the FEC, the FR values without FEC are also plotted for comparon. (a) (b) Fig. 6: The effect of adjusting FEC configuration on the performance of video streaming using a nonscalable MEG-4 source trace. t observed in Fig. 6 that the Go based FEC technique performs better than the frame-based technique in most cases. t may be apparent from Fig. 6(a) that the frame-based technique provides a better performance than the Go based technique for FEC (1,1,0) when p exceeds 0.12. However, the FEC configuration (1,1,0) only optimal near a packet loss probability of 0.005. When the packet loss probability exceeds 0.12, stronger FEC protection such as (4,2,0) should be employed. n that case, the Go based FEC technique will perform better. n other words, it can be concluded that the Go based FEC technique always performs better than the frame-based technique when an optimal FEC configuration used. 5 Conclusions n th paper a new analytical model derived to evaluate a media-dependent FEC scheme for video streaming applications. t shown in the analytical results that in most typical network conditions, the usage of a Go-level FEC scheme should be preferred over a frame-level FEC scheme. The analytical results are validated by experimental simulations on the -2 network simulator. Our model can be used to compute the optimal allocation of FEC for compressed video streams of different rates at a given estimate of the network loss probability. t clear that the results hold for any type of video data those are compressed by hybrid video encoders. References: [1] K. ark and W. Wang, Qo-sensitive Transport of Real-time MEG Video using Adaptive Forward Error Correction, roc. of EEE Multimedia ystems, June 1999, pp. 426-432. [2] K. Mayer-atel, L. Le, and G. Carle, An MEG erformance Model and its Application to Adaptive Forward Error Correction, roc. of ACM Multimedia, December 2002, pp. 1-10. [3] H. Wu, M. Claypool, and R. Kinicki, A Model for MEG with Forward Error Correction and TCfriendly andwidth, roc. of the ACM ODAV, 2003, pp. 122-130. [4] J. adhye, V. Firoiu, et al., Modeling TC Reno erformance: A imple Model and its Empirical Validation, EEE/ACM Transactions on etworking, Vol. 8, ssue 2, April 2000, pp. 133 145. [5] L. Rizzo, Effective Erasure Codes for Reliable Computer Communication rotocols, ACM GCOMM Computer Comm. Rev., Vol. 27, o. 2, April 1999, pp. 24-36. [6] Y. Yuan, Wavelet Video Coding with Application in etwork treaming, h.d. Thes, Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canadapring 2005. [7] VT roject, The etwork imulator -2, http://www.i.edu/nsnam/ns/.