New Scalable Modalities in Multi-view 3D Video

Size: px

Start display at page:

Download "New Scalable Modalities in Multi-view 3D Video"

Adrian Watkins
5 years ago
Views:

1 New Scalable Modalities in Multi-view 3D Video Hoda Roodaki Multimedia Processing Laboratory, School of Electrical and Computer Engineering, University of Tehran Mahmoud Reza Hashemi Multimedia Processing Laboratory, School of Electrical and Computer Engineering, University of Tehran Shervin Shirmohammadi Multimedia Processing Laboratory, School of Electrical and Computer Engineering, University of Tehran and Distributed and Collaborative Virtual Environments Research Laboratory, School of Electrical Engineering and Computer Science, University of Ottawa ABSTRACT Both three dimensional (3D) and multi-view video technologies have made noticeable progress and become more popular in recent years. 3D video expands the user s experience beyond the conventional 2D video by adding the sensation of depth, while multi-view video shows the same scenery from different viewpoints. In both cases, huge amount of data need to be compressed and transmitted, making it challenging to support heterogeneous mobile devices with limited bandwidth and processing power. Scalable Multi-view Video Coding is one of the main techniques that addresses this challenge by scaling down the video. However, in addition to conventional scalable modalities of temporal, spatial, quality, and complexity in 2D video, SMVC has many more modalities, adding a much higher dimension to the difficult decision making process in the video scalability engine. In this paper, we use Grounded Theory to systematically extract various scalable modalities in multi-view 3D video and we find, in addition to some known modalities, some new modalities specifically for mobile multi-view 3D video. The usefulness of these scalable modalities in applications specific to mobile multi-view 3D video are also shown. Categories and Subject Descriptors I.4.2 [Image Processing and Computer Vision]: Compression (Coding) Approximate methods. General Terms Algorithms. Keywords Multi-view video, Mobile 3D video, Scalable video. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MoVid 13, February 26-March 1, 2013, Oslo, Norway. Copyright 2013 ACM /13/02...$ INTRODUCTION Recent advances in 3D generation and display technologies have paved the way to a variety of 3D video applications and devices that expand the users' experience beyond traditional 2D video. In recent years, multi-view video coding (MVC) has emerged as an enabling technology for applications such as 3D video, freeviewpoint video, immersive teleconference, and 3DTV. A multiview video captures the same scene from different viewpoints, resulting in a richer user experience but also a significantly larger media size compared to traditional single view video. To reduce the size, the inherent temporal and inter-view dependencies among views has been exploited in standards such as the H.264/AVC s Multi-view video coding (MVC) extension, to support 3D and multi-view video coding. At the same time, we are also experiencing a surge in multimedia enabled mobile devices such as tablets, netbooks, and smartphones, which are becoming more powerful with improved CPU/GPU processing capability and speed, memory capacity, display resolution and quality, and ubiquitous broadband connectivity. 3D display technologies have also progressed rapidly, allowing users to view 3D multimedia on mobile devices even without the need for special glasses. Several smartphones, such as the HTC EVO 3D and LG Optimus 3D, and tablets such as Qualcomm s autostereoscopic 3D tablet showcased at the CTIA 2012 trade show, as well as mobile game consoles such as the Nintendo 3DS are able to display 3D content glasses-free. Despite the above progress, mobile devices are still less capable compared to other personal computers, especially in terms of battery life which is a significant limiting factor. Bandwidth is another limitation even with the 100 Mbps bandwidth provided in 4G networks, since using more bandwidth will contribute to more energy consumption and hence faster battery drainage. One possible solution to address these limitations is scalability: to reduce the amount of data in multi-view 3D video in order to better manage power and bandwidth consumption. With scalable video coding (SVC), such as the SVC extension of H.264/AVC, one can control the bitrate of the transmitted video in modalities such as temporal (frame rate), spatial (resolution), quality (PSNR), complexity [5], or a combination of them. SVC can hence support the distribution of video to heterogeneous receivers. Typically, a scalable bitstream consists of a Base layer that carries the minimum amount of video information necessary for all receivers. Then, one or more Enhancement layers can be built on top of the base layer to improve the overall video quality. However, selecting which modality to use for scalability is an important decision making problem. In other words, if a mobile client has an available bandwidth which is less than the bitrate of the original video, what type of video adaption (temporal, spatial, quality, or combination of them) should be performed in order to match the bitrate of the adapted video to the bandwidth of the mobile client? This is not an easy problem to solve, with much research effort such as [1] devoted to solving it for 2D video.

2 Scalable Multi-view Video Coding (SMVC) however increases the dimension of this problem even more significantly, from 3 modalities to at least 10 as we shall show in this article (Table 2). It is therefore important to identify and understand the scalable modalities in SMVC, before we can scale 3D and/or multi-view video in practice. However, there has been no attempt to explore all applicable scalable modalities for multi-view 3D systematically. Without such exploration, it is not clear if and how these modalities may fit new 3D applications. Additionally, in order to measure the performance of the scaling down process, continuous quality assessment of the scalable multi-view sequence is necessary. Clearly, this quality assessment method should be determined according to each specific type of scalable modality [2]. Understanding scalable modalities is therefore important in several aspects of SMVC. In this article, we apply Grounded Theory to find scalable modalities in multi-view 3D video. Grounded theory is a qualitative research method that inductively develops an understanding of a phenomena [3]. The result of our work will enable researchers and practitioners to better understand scalability in multi-view 3D video and use appropriate scalable modalities for applications of multi-view 3D video. In section 2 we introduce the concept of scalability in multi-view 3D video, while section Error! Reference source not found. 3 shows our application of Grounded Theory in data collection and analysis of SMVC. Section 4 presents the outcomes of our work and verifies them with real-world examples. Finally, the paper ends in section 5 with concluding remarks. 2. SCALABILITY IN MULTI-VIEW 3D VIDEO As mentioned above, scalability is an effective approach to reduce the amount of data in multi-view 3D video and to transmit the video in heterogeneous mobile environments. In single view video, temporal, spatial, quality and complexity scalability and various combinations of them have been used in order to produce scalable bitstreams [4]. Region-Of-Interest (ROI) and objectbased scalability are two other scalable modalities that have been used, though less frequently. In these modes, the partial bitstreams represent spatially connected regions of the original picture, and the other above-mentioned scalable modalities can be applied to these regions [6]. Finally, semantic scalability has been proposed to preserve the video content with higher subjective importance from bitrate reduction [7]. For 3D multi-view video, some methods have been proposed based on generalizing single view scalable modalities. In [8], the spatial resolution of each view has been changed to accommodate various display capabilities. [9] tries to support region of interest scalability in multi-view video coding and [10] proposes a powerscalable multi-view video encoding scheme by translating the encoder s computational complexity into power consumption. In addition to the above, several other scalable modes have been introduced specifically for multi-view 3D video. In one approach, view scalability enables the decoder to pick the number of views that should be decoded according to its resources [11]. In another approach, free view-point scalability provides a scalable bitstream structure to access partial bitstreams that generate selected views at the decoder side [12]. In stereoscopic video, scalability usually refers to keeping the non-stereoscopic bitstream as the base layer and putting the residual stereoscopic signal in one or more enhancement layers. In this context, the conventional spatial and temporal scalabilities may then be applied to each of these layers [13]. Finally, frame compatible video format [14] is a class of 3D video formats in which the left and right views are packed together in a single frame and with half of the resolution of the full coded frame, which can then also benefit from conventional scalability [15]. As we can see, features from single view, multi-view, 3D or stereo video are used in a rather ad-hoc manner to introduce the above scalable modalities. But there is no evidence that these scalable modalities are complete or enough for all multi-view 3D video applications. Consequently, we need to find scalable modalities that correctly represent the main characteristics of multi-view 3D video. In this paper, we will use Grounded theory as a systematic approach to find these modalities. 3. GROUNDED THEORY DATA COLLECTION AND ANALYSIS Grounded theory is a qualitative research approach that inductively develops an understanding of a phenomenon. It consists of several steps and the precise execution of them will guarantee a reliable outcome. The main idea of this theory is to read (and re-read) comprehensive documentary materials and discover basic categories, concepts, properties and relationships among them [3]. Data collection and analysis is accomplished using three main steps: open coding, axial coding and selective coding. The detailed description of Grounded Theory and its steps are beyond the scope of this paper, and we refer interested readers to [3] for more details. In this section, we show how we applied these steps and what results we obtained. Our Grounded theory data resources were 114 academic journal papers, proceedings papers, and doctoral dissertations related to video scalability, 3D video and multi-view video coding, which reflected the major research activities in this area during the last decade. In the next subsections, we will describe how we applied the three steps of Grounded theory to our problem. 3.1 Open Coding Step In this step, we should find the main concepts, categories and properties of our field of interest. The concepts can be determined through an iterative process and by referring to our textual data resources. The major purpose of each textual document, the main problems considered in those documents, and the proposed solutions that were presented in our references were considered in order to find the concepts. Once the concepts were determined, they were categorized at a higher level. What each concept represents and how it is similar to or different from the other ones is the main criteria to categorize them in an upper level. Most of the time, concepts have different names but refer to the same idea. The main idea of each concept is therefore considered as its corresponding higher level category. We applied this process until we were convinced that a level of saturation in category identification has been reached. For instance, we identified a category called 3D video format drawn from the concepts of 3D content formats that correspond to each specific display type, the methods used to produce and display them, and novel methods to overcome existing limitations of 2D video, all of which were concepts discussed extensively in the literature. Similarly, highlydiscussed concepts of efficient compression of multi-view video, and 3D depth map motion estimation and compensation, lead to the identification of a Multi-view 3D video compression

3 Figure 1. The categories, subcategories and core category that were extracted using various steps of Grounded theory category. These two categories, plus the other 12 categories we obtained in this step, are shown in Figure Axial Coding Step In this step, for each category we identified subcategories to cover all aspects of the related studies in the fields of video scalability 3D video and multi-view video coding. The categories identified during the open coding step have an inherent hierarchy. This hierarchy should be recognized in this step to identify their subcategories. Subcategories are categories with particular and identifiable properties that can provide useful information for a higher order category. The subcategories are shown hierarchically in Figure Selective Coding Step This step aims to reach a logical completion of the study by integrating all the work and providing some theoretical explanations of the phenomenon under study. The core category is the main problem and the central theme of the study. Hence, we have selected "scalable modalities" as our core category and the main issue that we follow. Understanding the relationships among emergent categories is not intuitive. Once a core category is determined, all other categories become sub-categories to the core. The sub-categories in the relational hierarchy become the core category descriptors that describe its properties, actions and interactions, importance, and the way of understanding the core category. The extracted relationships should show exactly how the corresponding concepts of one category can describe the core category. This conceptual relationship will help us get closer to our final goal. Table 1 shows a conceptual overview of the relationship and their corresponding concepts that we extracted. The first column shows the relationships between "scalable modalities" as the core category and the remaining categories extracted during our explorative research. Here is an example on how to find the relationships. 3D video formats compared to primitive single view video have some new features such as frame compatible video format that was described in section 2. These features should be reflected in multi-view 3D scalable modalities for efficient use of scalability in corresponding new applications. In this case, the 3D video format category adds some requirement to the core category of scalable modalities. The defined relationship entitled add requirement to was selected according to this consideration. Similarly, features from each category were used to extract the proper relationships between them and the core category. These features are shown in the second column of Table 1. Table 1. Relationship between Categories and Subcategories Categories and (relationships) with the core category Scalable video coding (accomplishes core) MVC and Scalable video standards Multi-view 3D video format (add scalable modalities to core) Multi-view 3D video quality Multi-view 3D video rendering Error protection in scalable video Scalability modes (supported by core) Multi-view 3D video transmission (utilizes core) Multi-view 3D object extraction Scalable decoder complexity Multi-view 3D compression Multi-view 3D Representation (facilitated by core) Corresponding Concepts for extracting relationships - Object-based video coding - Flexibility in video transmission - Mixed resolution multiview 3D video format - Depth perception - Suppression theory - Depth map information - Side information for virtual view synthesis - Transmission over errorprone networks - Depth map information -The bitstream adaptation heterogeneity - Low complexity video decomposition The obtained scalable modalities and their corresponding features are summarized in first and second columns of Table 2, respectively. Some of the scalable modalities of this table, including view scalability, free-view point scalability, asymmetric spatial scalability and frame compatible scalability were introduced before in section 2. The remaining ones that have not been introduced before in the literature and their corresponding features will be explained in more details in the next section. As we can see from Table 2, there are 10 scalable modalities that

need to take into account when working with multi-view/3d video. Table 2.

scalability Depth scalability Complexity scalability Level scalability Depth-resolution scalability, or Depth-quality scalability Side information scalability Human perception scalability using

multiview 3D video format - Depth map information - The bitstream adaptation heterogeneous network conditions - Low complexity video decomposition - The bitstream adaptation heterogeneous network

4 need to take into account when working with multi-view/3d video. Table 2. Obtained Scalable Modalities and Their Features Obtained Scalable Modalities View scalability Free-view point scalability Asymmetric spatial scalability (In stereoscopic video) Frame compatible scalability Depth scalability Complexity scalability Level scalability Depth-resolution scalability, or Depth-quality scalability Side information scalability Human perception scalability using Suppression Theory Related Features - Flexibility in video transmission - The bitstream adaptation heterogeneous network conditions - Mixed resolution multiview 3D video format - Mixed resolution multiview 3D video format - Depth map information - The bitstream adaptation heterogeneous network conditions - Low complexity video decomposition - The bitstream adaptation heterogeneous network conditions - Low complexity video decomposition - Depth map information (for stereoscopic video only) - Side information for synthesis virtual view - Suppression theory 4. Results In this section, we take a closer look at some of the scalable modalities shown in Table 2 and we show their necessity in realworld scenarios. 4.1 Depth scalability Depth information is an important parameter in 3D video. It reflects the distance of objects in the scene from the camera. The partial bitstreams in this case correspond to different parts of the scene according to their distance from the camera, in other words their depth perception. As an example of depth scalability, the base layer may consist of the areas in the scene with the closest distance to the camera. The enhancement layers will then include the remaining areas. This way, all receivers will receive the areas with the closest distance and are able to render the foreground of the scene. Similar to any other scalability mode, each layer of depth scalability can be combined with any other scalability mode such as temporal, spatial, etc. generating hybrid scalable modalities Example application Large Scale Immersive Virtual Environment (LSIVE) [16] in a teaching hall is an example application of multi-view 3D video. In an LSIVE teaching hall, the participants will be represented in receivers as 3D avatars. The distance of the participants from each other determines the depth. The size and quality of the 3D avatar of the person who needs to be the center of attention should be higher than the other ones. Furthermore, LSIVE provides the capability for a large number of participants to interact through the use of multimedia. Hence, in LSIVE, each user is both an information source and a recipient. This requires the simultaneous interconnection between a large number of users through a variety of media content such as video, audio and text messaging. However, at any moment in time each participant usually interacts with a limited number of participants and needs to receive their content. Transmitting the whole content to all participants regardless of whether they are interested in receiving them or not is not only inefficient, but can also significantly affects the perceived quality by imposing unnecessary traffic which may result in network congestion, and requires high computational complexity from the receivers. LSIVE participants are distributed on several connection domains. The various bandwidth and processing power limitations of the receivers lead to strong heterogeneity. Scalability and specifically Depth Scalable modality is a good approach to reduce the amount of data in this application. In depth scalable modality for this scenario and for each specific participant, the base layer may consist of parts of the scene at a certain distance (depth level) from that participant (e.g. the distance to this person s center of attention). Enhancement layers will include the remaining parts (such as the less-important parts of the scene) in a hierarchical order. By receiving only the base layer, all receivers will be able to generate all views of their corresponding center of attention. Receivers with more resources can receive more enhancement layers and improve the depth level and the inherent perceptual quality accordingly. Figure 2 (a) shows the content of the base layer for the teacher of a virtual class in this scalable modality where only one person in the center of attention can be seen and (b) the base plus enhancement layers where all parts of the scene can also be seen. (a) (b) (d) Figure 2. Two separate examples: Depth Scalability for a virtual class (left column) and Complexity Scalability with prediction structure (right column). For each scalable modality, respectively, (a) and (c) are the Base Layer and (b) and (d) are Base plus Enhancements Layers. 4.2 Complexity scalability MVC proposes a prediction structure that allows each frame to be not only predicted from temporal references, but also from other views as a reference. This prediction structure is adaptive and the best predictor among temporal and inter-view references can be selected for each frame based on the rate-distortion-complexity cost [14]. Hence, the suggested prediction structure can be considered as a measure to control the decoding complexity explicitly for multi-view video coding. In complexity scalability, the partial bitstreams consist of different prediction structures, enabling the receiver to select the partial bitstream according to its computational complexity constraints. For instance, the base layer may consist of the simplest form of prediction structure, such as (c)

5 simulcast coding [14]. The enhancement layers will then include more complex prediction structures by allowing inter-view prediction and consequently generating higher quality views at the cost of more computational complexity. Figure 2 (c) shows the proper prediction structure for the base layer in complexity scalability since it only uses simulcast coding with minimum computational complexity. In Figure 2 (d) the prediction structure with inter-view predictions and with higher computational complexity is suggested, and is suitable for enhancement layer coding in complexity scalability Example application In Free View-point video (FVV), several candidate views exist and the viewer selects one of them, so the receiver does not require all the views. Considering the inherent inter-view dependencies used in the prediction process, it is clear that using the standard MVC prediction structure requires all views to be decoded at an end point in order to decode a specific view correctly. This is clearly not efficient or even possible in some cases. As a solution, the Complexity Scalable modality can be used to produce partial bitstreams with a restricted prediction structure. A specific view can then be decoded using this partial bitstream with lower computational cost but also at lower quality. It should be noted that although complexity scalability has been studied in the context of single view video before, the concept of complexity in multi-view 3D video is different mostly due to inter-view prediction being the main source of computational complexity in multi-view coding. The complexity scalability mentioned in this article uses the concept of inter-view prediction. 4.3 Level scalability Profiles and levels of a video coding standard such as H.264/AVC specify some limits on the capabilities needed to decode the bitstreams. For any given profile, levels usually correspond to decoder processing overhead and memory capability. In level scalability, the base layer may consist of the coded sequence with the lowest level number of any given profile, in order to support the minimum requirements of that profile. For example, for maximum frame size, the base layer can have a profile of 1,458 luma samples and 380,160 macroblocks. Then, enhancement layers will include the sub-bitstreams that were coded using the other levels that are specified for the corresponding profile according to the standard. This way, each receiver can select the partial bitstream that meets its quality and resource constraints. Continuing the above example, Enhancement Layer 1 can then have a profile of 3,000 luma samples and 768,000 macroblocks, while Enhancement Layer 2 can have a profile of 19,800 luma samples and 5,068,800 macroblocks. Similar profiles can be defined for other levels such as maximum decoding speed, maximum video rate, etc Example application Mobile 3DTV is a mobile alternative for 3DTV, as one of the main applications of 3D video with advantages such as small screen technology and not needing special glasses. In 3DTV, viewers should be able to watch the 3D scene from different viewing angles and hence multiple views need to be decoded simultaneously. This approach requires extra processing power, which is not available in most mobile applications. Level Scalability can be used as a good solution for this purpose. In this case, it can produce various sub-bitstream at different complexity levels. By receiving only the base layer that is coded with the minimum levels, all receivers will be able to generate all views with minimum computational complexity. Receivers with more resources can receive more enhancement layers and improve the inherent perceptual quality accordingly. 4.4 Depth-resolution/depth-quality scalability Stereoscopic video is the simplest form of 3D video. Color plus depth map based stereoscopic video has attracted significant attention since it can reduce storage and bandwidth requirements. Recent researches show that depth information can be compressed considerably without affecting the quality of 3D experience [17]. In this scalable modality, the base layer may consist of the depth component at minimum spatial resolution or quality level. The enhancement layers will then include partial bitstreams with depth components at a higher spatial resolution or quality level to generate several video sequences with similar quality but different bitrates suitable for receivers with various bandwidth limitations. Figure 3 (a) and (b) show the base layer with lower level resolution and lower level quality of depth information, respectively. Then, enhancement will contain the depth information with original resolution and quality levels. Figure 3. (a) Base and Enhancement layers in stereoscopic video depth-resolution scalability (b) Base and Enhancement layers in stereoscopic video depth-quality scalability Example application The benefits of stereoscopic 3DTV have been shown in many applications such as medical video processing where depth impression enhances the perceptual viewing experience. Since medical video processing is highly sensitive to compression error, Depth-resolution/Depth-quality Scalability in stereoscopic video is a good solution for solving the problem of compressing such huge amount of video data in this specific application. The base layer consists of lower bitrate bitstream that contains depth component with lower special resolution or quality without any harmful degradation in overall quality perception, while receiving enhancement layers can improve the overall quality effectively. 4.5 Side information scalability Virtual view synthesis is a key technology for future free viewpoint video as one of the main applications of 3D video. A synthesized multi-view video is created from neighboring views using some kind of side information. Clearly, the quality of the side information can strongly affect the performance of synthesized views. Indeed, this additional information should be sent along the MVC bitstream, which adds to network traffic. In this scalable modality, the base layer may consist of one view and the side information with the minimum acceptable quality. The enhancement layers will then include the side information coded at multiple quality levels. Hence, each receiver can synthesize virtual views just by receiving the base layer with satisfactory quality and still can improve the quality of its recreated views by receiving additional enhancement layers.

6 4.5.1 Example application FVV is one of the main applications of multi-view video that allows the user to interactively control the viewpoint and even generate new views of a 3D scene. Side information scalability can be used to synthesize virtual views that were not captured originally. This scalable modality can be used for heterogeneous receivers with various bandwidth and processing power capabilities. 4.6 Human perception scalability using Suppression Theory As we described before, in single view video, temporal, spatial, and quality scalability have been used in order to produce scalable bitstreams. But there is no systematic approach for extending the conventional scalable modalities from single view to multi-view 3D video. We have used a well-known concept in multi-view 3D video, named suppression theory, for this generalization. Suppression theory mentions that the overall perception quality in a stereoscopic video is determined by the higher quality view [18]. This can be generalized to multi-view 2D video and multi-view 3D video as follows. In multi-view 2D video, if the quality of a subset of views that are more important to the user is high, then the overall subjective quality for this user is rated high. Hence some spatial, SNR, temporal scaling or a combination of them for the non-important views can provide acceptable perception performance with a reduced bitrate. In this case, the base layer may consist of a subset of important views with acceptable spatial, temporal or quality level. Then, the enhancement layers include the other views at different spatial, temporal or quality levels. This type of extension results in reconstructed multi-view sequences with much lower bitrate but almost the same subjective quality. This type of scalability can be generalized to multi-view 3D video format by applying some spatial, SNR or temporal scaling to a subset of left or right views that are captured through various viewing angles. This type of scalable modality is most suitable for multi-view 2D or 3D video applications that usually suffer from the inherent increase in storage and transmission requirements of multi- view video. 5. CONCLUSIONS In this paper we took a closer look at various scalable modalities in multi-view 3D video. A large collection of multi-view 3D video research results from the last decade were examined and analyzed using Grounded theory approach. First, the main concepts of the data resources were identified through the analysis of the literature. Then, the concepts were categorized and the relationships between the categories were clarified by considering the corresponding concepts. Finally, a conceptual understanding of the phenomena under study was developed. It was demonstrated that the scalable modalities that we showed in Table 2 have distinct and meaningful usage, and must be considered when designing scalability solutions in specific applications of multi-view 3D video. 6. References [1] Hefeeda, M., and Hsu, C.H Rate-Distortion Optimized Streaming of Fine-Grained Scalable Video Sequences. ACM Transactions on Multimedia Computing, Communications and Applications, 4, 1 (January 2008). [2] Roodaki, H., Hashemi, M.R., and Shirmohammadi, S A New Methodology to Derive Objective Quality Assessment Metrics for Scalable Multi-view 3D Video Coding. ACM Transactions on Multimedia Computing, Communications, and Applications. 8, 3S (September 2012). [3] Peng, Y., Kou, G., Shi, Y., and Chen, Z A Systemic Framework for the Field of Data Mining and Knowledge Discovery. In IEEE International Conference on Data Mining Workshops (Hong Kong, China, December 18-22, 2006). [4] Schwarz, H., Marpe, D., and Wiegand, T Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. In IEEE Transactions on Circuits and Systems for Video Technology. 17, 9 (September 2007), [5] Jeong, J., Byun, K., Kim, J., and Ko, S A complexity scalable H.264 decoder with downsizing capability for mobile devices. IEEE Transactions on Consumer Electronics. 56, 2 (May 2010), [6] Grois, D., Kaminsky, E., and Hadar, O Dynamically adjustable and scalable ROI video coding. In IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (Shanghai, China, March 24-26, 2010). [7] Lai, J.H., and Chien, S.Y Semantic scalability using tennis videos as examples. Multimedia Tools Application. 59, 2 (July 2012), , [8] Vetro, A., Yea, S., Zwicker, M., Matusik, W., and Pfister, H Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays. In IEEE International Conference on Image Processing (San Antonio, Texas, USA, September 16-19, 2007). [9] Lai, Y., Lan, X., Li, X., Liu, Y., and Zheng, N Region of interest support in scalable multi-view video coding. In IEEE International Conference on Consumer Electronics (Las Vegas, NV, USA, January 9-12, 2011). [10] Kang, L. W., Lu, C. S Low-complexity powerscalable multi-view distributed video encoder. In Picture Coding Symposium (Lisbon, Portugal, November 7-9, 2007). [11] Shimizu, Sh., Kitahara, M., Kimata, H., Kamikura, K., and Yashima, Y View Scalable Multiview Video Coding Using 3-D Warping With Depth Map. IEEE Transactions on Circuits and Systems for Video Technology, 17, 11 (November 2007), [12] Ho, Y.S., and Oh, K.J Overview of multi-view video coding. In EURASIP Conference focused on Speech and Image Processing Multimedia Communications and Services (Maribor, June 27-30, 2007). [13] Tseng, B.L, Anastassiou, D Compatible video coding of stereoscopic sequences using MPEG-2's scalability and interlaced structure. International Workshop on HDTV '94 (Torino, Italy, October 26-28, 1994). [14] Vetro, A., Wiegand, T., Sullivan, G Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard. Proceedings of the IEEE. 99, 4 (January 2011), [15] Vetro, A., Tourapis, A.M., Muller, K., and Chen, T D-TV Content Storage and Transmission. In IEEE Transactions on Broadcasting, 57, 2 (June 2011), [16] Raad, M., Raad, R., and Safaei, F Media rate control for large scale immersive communications. In IEEE International Conference on Multimedia and Expo (Barcelona, Spain, July 11-15, 2011). [17] Leon, G., Kalva, H., and Furht, B D Video Quality Evaluation with Depth Quality Variations. In 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (Istanbul, Turkey, May 28-30, 2008). [18] Ozbek, N. and Tekalp, A. M Quality Layers in scalable multi-view video coding. In IEEE international conference on Multimedia and Expo (New York, NY, June 28 - July 2, 2009).

Multiview Video Coding

Multiview Video Coding Jens-Rainer Ohm RWTH Aachen University Chair and Institute of Communications Engineering ohm@ient.rwth-aachen.de http://www.ient.rwth-aachen.de RWTH Aachen University Jens-Rainer