Scalable multiple description coding of video sequences

Scalable multiple description coding of video sequences Marco Folli, and Lorenzo Favalli Electronics Department University of Pavia, Via Ferrata 1, 100 Pavia, Italy Email: marco.folli@unipv.it, lorenzo.favalli@unipv.it Abstract In this paper, we address the problem of robust transmission of compressed visual information and propose two transmission schemes based on Multiple Description Scalable Coding (MDSC). In both schemes, we generate four spatial subsequences from the original video stream, by subsampling in both horizontal and vertical directions. Then, each description is encoded using two subsequences using the H.4/SVC video coder, thus generating two scalable descriptors. In the first scheme, we predict one subsequence from the other one using the inter layer prediction tools. In the second scheme, we exploit the redundancy between the subsequence with the hierarchical dyadic B frame prediction algorithm. These schemes provide better performances with respect to the most used spatial multiple description coded solution, called Polyphase Spatial Subsampling multiple description (PSS-MD), at the cost of a very low additional system complexity. KEYWORDS H.4/SVC, Multiple Description Coding, scalability, hierarchical B pictures, inter layer prediction I. INTRODUCTION Video communications are becoming increasingly popular and more and more important for the global information infrastructure thanks to the increasing available bandwidth and to more and more efficient video compression techniques. Despite these advances, the bandwidth requirements force a trade off between quality guarantee and resources utilization. Scalability has always been seen as a possible way to cope with this problem and, more recently, in the literature a new approach has emerged in which the coder generates multiple data streams called descriptions or descriptors. While scalability techniques produce multiple streams of which only one (the base-layer is independently decodable, all the descriptions are independently decodable To achieve this, it is necessary to introduce in each descriptor some amount of redundancy. Then, the descriptors can be transmitted using different physical channels or on the same channel but with different error protection or rate (same physical channel, different logical channel). At the receiver, if all the descriptions are received, ideally it s possible to recover the quality of the original stream. Otherwise, if only some descriptions are available, the original sequence is still decodable, but at a lower quality. A comprehensive overview of various MDC techniques is provided by Goyal [1]. A simple and efficient scheme for multiple descriptions video coding consists in splitting the even and odd pictures of a video sequence into two separate descriptions, which can be encoded using a certain coding scheme such as H.4. Such multiple description algorithm was developed in [2]. Subsequently, Wang [3] improved this scheme by adding some motion compensation between the descriptors. Another efficient and simple approach to generate multiple description streams was proposed by Vitali et al. [4], using spatial subsampling in both horizontal and vertical directions, thus obtaining four descriptions. This kind of algorithm is called polyphase spatial subsampling multiple description (PSS-MD) coding. In their work, they show that in error prone networks such a scheme provides better or equal robustness with respect to other solutions such as forward error correction (FEC), but with a lower system complexity. A possible improvement of this scheme is proposed in [5], where the original sequence is first oversampled via DCT transform and zero padding, then the descriptors are obtained by subsampling this oversampled sequence. These schemes are only aimed at increasing the robustness of the video stream, without taking into consideration any other transmission challenges, such as bandwidth variations or device heterogeneity, where a scalable approach is required. Recently, some multiple description scalable coding (MDSC) schemes have been proposed as efficient hybrid solutions. A very simple extension of the method proposed in [2] is the algorithm described in [6], in which two descriptions are obtained by temporal subsampling, and the Fig. 1. Example of polyphase downsampling system GTTI 08 - Sessione elaborazione dei segnali 1

2 scalability is simply achieved by hierarchical B prediction. For the PSS scheme, several interesting scalable approaches have been proposed. In [7], a combination of motion compensation and spatial subsampling is implemented; in [8], some spatial correlation of the scheme is exploited by predicting two of the four subsequences from their neighboring ones, then the residuals are coded using the discrete cosine transform and variable length coding. Finally, in [9] we developped a further improvement of the previous scheme, by adding a sign bit to the predicted subsequences and then coding them using a standard coder (H.4/SVC). Given the fact that the original subsequences are obtained by simple subsampling, their correlation is very high and consequently coding efficiency is low. The starting point of this paper is to develop some PSS schemes capable to remove some of this redundancy and at the same time introduce some form of scalability. In order to be compatible with a standard H.4/SVC coder, all MDC operations are performed as pre- and post- processing so that the different descriptions can be implemented using some of the scalable coding tools implemented in H.4/SVC. The first method, called Inter Layer Prediction Spatial Multiple Description Scalable Coding (ILPS-MDSC), takes advantage of the inter layer prediction method to generate spatial or coarse grain streams. The second one, called Hierarchical B Frame prediction Spatial Multiple Description Coding (HBFPS-MDSC), uses the dyadic B frame prediction scheme, needed for temporal scalability, to predict one of the subsequences from another one. The proposed algorithms are presented in detail in section III, after some description of scalable coding tools implemented in H.4/SVC, in section II. A description of their implementation on top of the H.4/SVC coder and simulation results are provided in section IV. II. DESCRIPTION OF SCALABLE CODING TOOLS H.4/AVC [10] is one of the latest international video coding standard. It uses a state-of-the-art coding tools and provides enhanced coding efficiency for a wide range of applications, including video telephony, video conferencing and video streaming. It outperforms the MPEG2 standard at least doubling its performances while keeping the cost acceptable. The scalable extension of H.4/AVC, named H.4/SVC, uses a layered approach to provide spatial, temporal and SNR scalability. These options may be used at the same time so that a set of spatio-temporal-snr streams can be generated and decoded from a global scalable video stream, according to the selected encoder configuration. The scalable extension of H.4/AVC extends the hybrid coding approach of this coder toward motion compensated temporal filtering. However, the algorithm that performs the prediction and update steps is similar to the motion compensation techniques already used in the generalized B frame approach used in H.4/AVC, in order to guarantee the backward compatibility. Further detail of H.4/SVC can be found in [11]. A. Hierarchical B Frame Prediction Structure The temporal prediction structure is changed relatively to H.4/AVC. In this coder, low pass pictures, resulting from the update step, are generated in a different way than high pass pictures which follow the prediction step, as in H.4/AVC. Basically, a group of pictures (GOP) is generated and this group is partitioned in two sets of pictures. The decomposition is then performed such that the high pass pictures are aligned with one of the partitions, and the low pass pictures are aligned with the other one. However, when the GOP size is greater than 2, it s advantageous to hierarchically iterate the partition of the GOP in order to obtain only a single low pass picture. The most common way to do it is to do a temporal dyadic prediction, as seen in fig. 2. The delay introduced by this algorithm is coupled with the use of reference pictures that are displayed later than the predicted or updated pictures. Therefore, if the reference pictures are in the past relatively to the predicted pictures and the update step is omitted, then no additional delay is introduced. When the GOP size is very large, it is then possible to control the maximum delay of the prediction structure (for real time applications) by perfoming the update step only in a subpartition of the GOP in order to meet the delay constraints. It is easy to see that this scheme guarantees temporal scalability, since it is possible to remove those parts of the bit-stream that are not used as reference pictures for the remaining pictures, and correspond to the highest hierarchical prediction layer of the prediction/update structure. B. Inter-layer Prediction Spatial scalability in the H.4/SVC is performed only considering oversampling factors which are powers of 2 in Fig. 2. Dyadic temporal decomposition of a group of 16 pictures Fig. 3. Hierarchical prediction structure applied to our method

3.5.5 Fig. 4. Coder structure needed to perform ILPS-MDSC with inter layer prediction structure highlighted both horizontal and vertical resolution. Then, the video signal is represented using an oversampled pyramid and as a first interpretation every spatial resolution, e.g. spatial layer, can be coded independently from the others. However, it is clear that a higher resolution layers (e.g. the 4CIF layer) can be somehow affected by the presence of lower resolution layers. Consequently, it is possible to improve the performances of the scalable coding algorithm by exploiting the redundancy between the neighbors layers. Plus, increased efficiency is obtained by allowing the encoder to freely choose which dependencies between spatial resolution layers have to be exploited through a switchable prediction mechanism. In ous work we have developed two different techniques that provide good performance gains and are described in the next section. prediction of macroblocks using up-sampled lower resolution signals prediction of motion vectors using up-sampled lower resolution motion vectors prediction of residual signals using up-sampled residual signals of the lower resolution layer The same techniques can also be applied when the base layer has the same spatial resolution as the current layer (e.g coarse grain scalability). In this case, the up-sampling operations are simply discarded. III. PROPOSED SCHEMES We have developed two different schemes to generate multiple description, both of them based on prediction algorithms of the scalable extension of H.4/AVC. In order to preserve the standard coder, a pre- and post- processing scheme is.5.5.5 HBFPS MDSC, by rows, desc. 1 BL HBFPS MDSC, by rows, desc. 2 BL ILPS MDSC, by rows, desc. 1 BL ILPS MDSC, by rows, desc. 2 BL PSS MD, one desc. Fig. 6. Rate distortion, performance when receiving only one subsequence,.5.5.5 HBFPS MDSC, quincunx, desc. 1 BL HBFPS MDSC, quincunx, desc. 2 BL ILPS MDSC, quincunx, desc. 1 BL ILPS MDSC, quincunx, desc. 2 BL PSS MD, one desc. Fig. 7. Rate distortion, performance when receiving only one subsequence, implemented. In the pre-processing part, we down sample the original sequence by rows and columns thus generating four different sub-frames, similarly to what is done in PSS- MD. An overall view of the process can be seen in fig. 1. Then, descriptions are formed by coupling two different subsequences and sending them to the same standard scalable coder, thus obtaining two scalable descriptors that can be transmitted independently. In the post-processing part, the original sequence is obtained by merging the descriptions. In case of lost description or discarded enhancement layers, the missing pixels are reconstructed by interpolation from the received ones. Fig. 5. Image subsampling patterns. a:by rows, b:quincunx A. Hierarchical B Frame Prediction Spatial - Multiple Description Scalable Coding After obtaining the four subsequences, as in PSS-MD, we take two of them and generate a new video sequence by temporal interleaving, taking a picture from the first subsequence and one from the other alternatively, so that the sequence now has a frame rate twice the original. Then, the same scheme is applied also to the two remaining subsequences,

4 35 HBFPS MDSC, by rows, desc. 1 HBFPS MDSC, by rows, desc. 2 HBFPS MDSC, by rows, desc. 1 BL 2 BL ILPS MDSC, by rows, desc. 1 BL ILPS MDSC, by rows, desc. 2 BL ILPS MDSC, by rows, desc. 1 BL 2 BL PSS MD, two desc. HBFPS MDSC, by rows, desc. 1 2 BL HBFPS MDSC, by rows, desc. 1 BL 2 ILPS MDSC, by rows, desc. 1 2 BL ILPS MDSC, by rows, desc. 1 BL 2 PSS MD, three desc. Fig. 8. Rate distortion, performance when receiving two subsequences, Fig. 10. Rate distortion, performance when receiving three subsequences, 35 35 HBFPS MDSC, quincunx, desc. 1 HBFPS MDSC, quincunx, desc. 2 HBFPS MDSC, quincunx, desc. 1 BL 2 BL ILPS MDSC, quincunx, desc. 1 BL ILPS MDSC, quincunx, desc. 2 BL ILPS MDSC, quincunx, desc. 1 BL 2 BL PSS MD, two desc. HBFPS MDSC, quincunx, desc. 1 2 BL HBFPS MDSC, quincunx, desc. 1 BL 2 ILPS MDSC, quincunx, desc. 1 2 BL ILPS MDSC, quincunx, desc. 1 BL 2 PSS MD, three desc. Fig. 9. Rate distortion, performance when receiving two subsequences, Fig. 11. Rate distortion, performance when receiving three subsequences, so that at the end we only get two different new streams that will form the descriptors. At this point, we code each of them using a hierarchical dyadic B frame prediction, discussed in section II-A, in order to fully predict one of them from the other one. By doing so, the redundancy is reduced to a minimum and, also, scalability is achieved since discarding the predicted subsequence, leaves the description still decodable. The structure of the interlaced sequence and an example of prediction of a GOP is depicted in fig. 3. We underline that this method is very simple and virtually is compatible with every coder that support dyadic B prediction. B. Inter Layer Prediction Spatial - Multiple Description Scalable Coding As in the previous scheme, we group the four subsequences obtained in the pre-processing part in two groups of two subsequences each, that will form the descriptions. In this method, the inter layer layer prediction algorithm of the scalable extension of H.4/AVC, described in II-B is used instead of the hierarchical B frame prediction. In particular, we configure the coder to generate a coarse grain scalable (CGS) stream with only one enhancement layer. Instead of sending the same subsequence for the base and the enhancement layer (whch is a requirement for CGS), we use the inter layer prediction algorithm to remove the redundancy between the subsequences by assigning one of them to the base layer, and the other one to the enhancement layer, with the coder structure represented in figure 4. By doing this, most of the correlation is eliminated. Scalability now is simply obtained by discarding the enhancement layer, with both descriptors still decodable. At the decoder, we reconstruct the original subsequence that forms the description by first decoding the full stream (base plus enhancement layer), obtaining the predicted subsequence, then the other subsequence is given by simply decoding the base layer only. IV. RESULTS The software used in our experiments is H.4/SVC version 8.1. The different options provided by the coder have been set as follows 1/4 pixel accuracy for motion estimation a single reference frame

5 GOP size 8 I frame only at the beginning 16x16, 16x8, 8x16, 8x8 inter-prediction blocks with SAD metric CABAC CIF sequence with fps Results are reported using the sequences foreman (video calling environment) and football (high motion sequence). For the sake of comparison, our approaches are compared with PSS-MD. We choose two different methods to couple the subsequences generated in the pre-processing part. The first scheme, called by rows, consist in coupling the subsequences that form the rows of the original video sequence, so one description contains the pixels of the odd rows, when the other contains those belonging to the even rows. The second scheme, called quincunx, groups the subsequences so that the pixels form a quincunx lattice. Figure 5 shows how the subsequence in coupled to form the description. In fig. 5.a we can see the by rows scheme, where subsequences one and two form the first description, while subsequences three and four form the second one. Instead, in fig. 5.b the quincunx method is shown, with the subsequence one and four generating the first description, and the subsequences two and three generating the other. Before showing the results, we make a consideration about the interpolation schemes needed at the receiver when not all substreams are received. We use different interpolation algorithms accordingly to the number and type of subsequences received: if we receive only one subsequence (i.e. only the base layer ofone of the two descriptors), we recover the missing information computing the mean of the nearest pixels; if two subsequences are recived, if they are in a row fashion, then we recover the missing information considering the mean of the two nearest pixels; otherwise, if we received two subsequences in quincunx fashion, the missed information is recovered as the mean of the four nearest pixels; if three of the four subsequence is received, the missing information is obtained by interpolation of the eight nearest pixels. We have made two different experiments in our simulations, in the first experiment, we have evaluated the performance of both methods compared to the PSS-MD method via some empirical rate distortion curves. In the second experiment, we have found the optimal value of the ILPS-MDSC method, by varying the ratio between the base layer and the full description rate. Results are reported respectively in subsection IV-A and IV-B. A. RATE DISTORTION CURVE The results are shown as a rate distortion (RD) curve, from 100 kbit/s to 1900 kbit/s with a rate span of 200 kbit/s. The total bitrate is evenly divided in the ILPS-MDSC approach among the subsequences that form the base layer and the other one transmitted as an enhancement layer. In the Fig. 12. Fig. 13. 44 42 40 38 HBFPS MDSC, by rows, both desc. HBFPS MDSC, by rows, both desc. ILPS MDSC, quincunx, both desc. ILPS MDSC, quincunx, both desc. PSS MD, all desc. Rate distortion, performance when receiving all the subsequences ILPS MDSC, by rows, desc. 1 BL @ 500 ILPS MDSC, quincunx, desc. 1 BL @ 500 25 ILPS MDSC, by rows, desc. 1 BL @ 1000 ILPS MDSC, quincunx, desc. 1 BL @ 1000 ILPS MDSC, by rows, desc. 1 BL @ 1500 ILPS MDSC, quincunx, desc. 1 BL @ 1500 23 10 20 40 50 60 70 80 90 Optimal ratio, performance when receiving only one subsequence HBFPS-MDSC algorithm, the available bitrate is the target bitrate of the interlaced sequence, so we don t apply any rate subdivision among the subsequences that form a description and let the coder perform its best. In this subsection, all the results are reported using the foreman sequence, without loss of generality. Figures 6 and 7 show the performances when only one subsequence is received. As we can see, both methods outperform the simple PSS-MD algorithm and, in particular, we can note that the quincunx scheme gives very similar performances independently of the method used or which descriptor is received. Also, it is possible to see that the proposed schemes reach very quickly the asymptotic value given by the reconstruction algorithm. Figures 8 and 9 show the performances when receiving two subsequences, either a full description or the base layer of both descriptions. Two considerations can be made: first, when receiving one full description, the performances of the HBFPS-MDSC are always better with respect of PSS-MD, while the ILPS-MDSC scheme has more stable performances. Second, when we receive the base layer of both descriptions, the performances of the two methods are very different and mutual from the by rows and the quincunx scheme: in this cases, the performance of the by rows scheme, when receiving both base layer of

6 ILPS MDSC, by rows, desc. 1 @ 500 ILPS MDSC, by rows, desc. 1 BL 2 BL @ 500 ILPS MDSC, by rows, desc. 1 @ 1000 ILPS MDSC, by rows, desc. 1 BL 2 BL @ 1000 ILPS MDSC, by rows, desc. 1 @ 1500 ILPS MDSC, by rows, desc. 1 BL 2 BL @ 1500 10 20 40 50 60 70 80 90 ILPS MDSC, by rows, desc. 1 2 BL @ 500 ILPS MDSC, by rows, desc. 1 BL 2 @ 500 ILPS MDSC, by rows, desc. 1 2 BL @ 1000 ILPS MDSC, by rows, desc. 1 BL 2 @ 1000 ILPS MDSC, by rows, desc. 1 2 BL @ 1500 ILPS MDSC, by rows, desc. 1 BL 2 @ 1500 10 20 40 50 60 70 80 90 Fig. 14. Optimal ratio, performance when receiving two subsequences, Fig. 16. Optimal ratio, performance when receiving three subsequences, ILPS MDSC, quincunx, desc. 1 @ 500 ILPS MDSC, quincunx, desc. 1 BL 2 BL @ 500 ILPS MDSC, quincunx, desc. 1 @ 1000 ILPS MDSC, quincunx, desc. 1 BL 2 BL @ 1000 ILPS MDSC, quincunx, desc. 1 @ 1500 ILPS MDSC, quincunx, desc. 1 BL 2 BL @ 1500 20 10 20 40 50 60 70 80 90 Fig. 15. Optimal ratio, performance when receiving two subsequences, the descriptors, outperforms any other combination of two subsequences. This is due to the particular features of the original sequence, that promote with better performances a quincunx scheme instead of a by rows scheme and can be computed calculating the asymptotic performance of the reconstruction algorithm by coding the description at infinite bitrate. These are shown in table I for some very common video sequences: foreman, football, mobile and tempete. In particular, the quincunx method seems to give similar and better performance for both description in every considerated video sequence. A possible interpretation of this result can be made by taking Desc. 1, Desc. 2, Desc. 1, Desc. 2, by rows by rows quincux quincux Foreman.2 35.3 35.5 Football 35.1.4.3.3 Tempete.9.5 Mobile 25.5 25.2.5.5 TABLE I ASYMPTOTIC PERFORMANCE OF INTERPOLATION METHODS into account the larger distance between the pixels of the subsequences in a quincunx scheme, so the description itself carries more information with respect of the by rows scheme. However, this advantage is lost in case both descriptions are received since more information implies lesser redundancy that can be exploited by the scalable tools of H.4/SVC, giving lower coding efficiency. Figures 10 and 11 show the performances for the case when three received subsequences, meaning a full description plus the base layer of the other one. Now, both methods seem to perform better than the standard PSS-MD algorithm and have almost the same performances, which partially confirms the previous considerations. We can also note that the HBFPS- MDSC performs a little better than the ILPS-MDSC scheme with the by rows scheme than with the quincunx scheme. The reason for this feature seems to be in the scalable coding tool used in HBFPS-MDSC scheme. In fact, the hierarchical prediction structure needs that adjacent video pictures are strongly correlated in order to exploit well all the redundancy, so it best suited to a by rows rather than to a quincunx scheme. Finally, in figure 12, we report the performances when both description are fully received. In this case, we observe that HBFPS-MDSC performs really well with the by rows scheme, but its performances drop with the quincunx method (its performance is lower than PSS-MD), for the same reasons derived from the above consideration. Instead, the ILPS-MDSC gives the best overall performances. B. OPTIMAL RATIO FOR ILPS-MDSC In these simulations, we report the results as the average of the sequence at different base layer to full description ratios, starting from 10 to 90 percent. For each description, the total bitrate is chosen to be respectively 500, 1000, and 1500 kbit/s in order to exploit the optimal value at different rates. All the results of this subsequences are provided using football subsequences for space reasons but the same considerations also hold for all the other videos used in our tests. Figure 13 shows the performances when only one subsequence is received. Obviously, as we decode only the base

7 ILPS MDSC, quincunx, desc. 1 2 BL @ 500 ILPS MDSC, quincunx, desc. 1 BL 2 @ 500 ILPS MDSC, quincunx, desc. 1 2 BL @ 1000 ILPS MDSC, quincunx, desc. 1 BL 2 @ 1000 ILPS MDSC, quincunx, desc. 1 2 BL @ 1500 ILPS MDSC, quincunx, desc. 1 BL 2 @ 1500 10 20 40 50 60 70 80 90 Fig. 17. Optimal ratio, performance when receiving three subsequences, Fig. 18. 38 ILPS MDSC, by rows, both desc @ 500 ILPS MDSC, quincunx, both desc. @ 500 ILPS MDSC, by rows, both desc. @ 1000 ILPS MDSC, quincunx, both desc. @ 1000 ILPS MDSC, by rows, both desc. @ 1500 ILPS MDSC, quincunx, both desc. @ 1500 10 20 40 50 60 70 80 90 Optimal ratio, performance when receiving all the subsequences layer of one description, the quality of the reconstructed sequence improves as far as we increase the ratio. However, with a ratio more than 60%, the improvement becomes very little as we reach the asympotic value of the reconstruction algorithm. Figures 14 and 15 show the performances when receiving two subsequences, either a full description or the base layer of both descriptions. In those figures, we can see two different behaviors: when a full description in received, the maximum value seems to be achieved at a ratio between 50 and 60 percent. This indicates that it s better to generate balanced descriptors by assigning the available bitrate in an almost fair way. Note that since the proposed approach exploits the redundancy between base and enhancement layers, this indeed means that the enhancement layer gets a rather large amount of bits. On the other hand, if we reconstruct the sequence starting from the base layers of both descriptors, the trends seem to be more similar to the one subsequence figure. However, above 70% the gain is very low compared with the loss in case of receiving, otherwise, one description. Figures 16 and 17 show the performance for three received subsequences. Now, we always have one full description plus the base layer of the other one, so we have at the same time the two different behaviors of the case when receiving two subsequences. As we can see, the performance seems to be more influenced by the rate of the enhancement layer than by the gain of having another base layer, so the maximum performance can be achieved with a ratio of about 60-70%. Finally, figure 18 shows the performance when both descriptors are fully received. As we can see, we reach the maximun performance with a ratio of 60%, thus confirming all the above given considerations. V. CONCLUSIONS In this paper we introduced some possible novel algorithms to generate multiple description in a H.4/SVC coder and have shown their performances compared to single description coding. Also, the second set of experiments shows that the better overall performance can be obtained by using a ratio of about 60-70% in almost every considerated sequences. Work is in progress to improve these algorithms and to introduce them in real network scenarios to exploit their adaptability and robustness features. Plus, future works will be related to introduce Fine Granular Scalability in each layer in order to be more flexible at variable bitrate and to find some rate distortion analytic functions that fits well the rate distortion curve obtained. REFERENCES [1] V.K. Goyal, Multiple Description Coding: Compression meets the network, International Conference on Image Processing (ICIP), vol. 18, no. 5, pp. 74-93, Sept. 2001. [2] J.G. Apostolopoulos, Error-resilient video compression through the use of multiple states, Proc. IEEE ICIP 00, Vancover, Canada, vol. 3, pp. 352-355, 2000. [3] Y. Wang and S. Lin, Error resilient video coding using multiple description motion compensation, IEEE Transaction Circuits and Systems for Video Technology, vol. 12, no. 6, pp. 438-452, June 2002. [4] A. Vitali, F. Rovati, R. Rinaldo, R. Bernardini and M. Durigon, Low- Complexity Standard-Compatible Robust and Scalable Video Streaming over Lossy/Variable Bandwidth Networks, IEEE International Conference on Consumer Electronics, Jan. 2005, Las Vegas, USA, pp. 10-1025. [5] S. Shirani, M. Gallant, and F. Kossentini, Multiple description image coding using pre- and post-processing, IEEE Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, USA, pp. 35-39, April 2001. [6] M. Liu, Ce Zhu, Multiple description video coding using hierarchical B pictures, IEEE International Conference on Multimedia and Expo, Beijing, China, pp. 17-1370, July 2007. [7] N. Franchi, M. Fumagalli, R. Lancini, and S. Tubaro, Multiple Description Video Coding for Scalable and Robust Transmission Over IP, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 3, pp. 1-3, March 2005. [8] Zhe Wei, Canhui Cai, and Kai-Kuang Ma, A Novel H.4-based Multiple Description Video Coding Via Polyphase Transform and Partial Prediction, International Symposium on Intelligent Signal Processing and Communications, Yonago, Japan, pp. 151-154, December 2006. [9] M. Folli, L. Favalli, Multiple Description Coding algorithms for H.4 coder, Mobimedia 07, Nafpaktos, Greece, August 2007. [10] H. Schwarz, T. Hinz, H. Kirchhoffer, D. Marpe, and T. Wiegand, Technical description of the HHI proposal for SVC CE1, ISO/IEC JTC1/SC/WG11, Doc. m114, Palma de Mallorca, Spain, Oct. 2004. [11] R. Schafer, H. Schwarz, D. Marpe, and T. Wiegand, MCTF and Scalability Extension of H.4/AVC and its applications to video transmission, storage and surveillance, Visual Communications and Image Processing, July 2005.