PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Similar documents
PERCEPTUAL VIDEO QUALITY ASSESSMENT ON A MOBILE PLATFORM CONSIDERING BOTH SPATIAL RESOLUTION AND QUANTIZATION ARTIFACTS

Overview: Video Coding Standards

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

SCALABLE video coding (SVC) is currently being developed

Improved Error Concealment Using Scene Information

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

Chapter 2 Introduction to

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

H.264/AVC Baseline Profile Decoder Complexity Analysis

HEVC Subjective Video Quality Test Results

Adaptive Key Frame Selection for Efficient Video Coding

ERROR CONCEALMENT TECHNIQUES IN H.264 VIDEO TRANSMISSION OVER WIRELESS NETWORKS

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Key Techniques of Bit Rate Reduction for H.264 Streams

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS

Error Resilient Video Coding Using Unequally Protected Key Pictures

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Project Proposal: Sub pixel motion estimation for side information generation in Wyner- Ziv decoder.

Visual Communication at Limited Colour Display Capability

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Scalable multiple description coding of video sequences

Scalable Foveated Visual Information Coding and Communications

Video Over Mobile Networks

Error concealment techniques in H.264 video transmission over wireless networks

SCENE CHANGE ADAPTATION FOR SCALABLE VIDEO CODING

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

Content storage architectures

Parameters optimization for a scalable multiple description coding scheme based on spatial subsampling

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Multimedia Communications. Image and Video compression

Joint source-channel video coding for H.264 using FEC

RECOMMENDATION ITU-R BT Methodology for the subjective assessment of video quality in multimedia applications

Multimedia Communications. Video compression

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

The H.26L Video Coding Project

1 Overview of MPEG-2 multi-view profile (MVP)

Systematic Lossy Forward Error Protection for Error-Resilient Digital Video Broadcasting

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Video Quality Monitoring for Mobile Multicast Peers Using Distributed Source Coding

The H.263+ Video Coding Standard: Complexity and Performance

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Digital Image Processing

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

AUDIOVISUAL COMMUNICATION

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Characterizing Perceptual Artifacts in Compressed Video Streams

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

ITU-T Video Coding Standards

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Video coding standards

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Understanding Compression Technologies for HD and Megapixel Surveillance

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICASSP.2016.

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Monitoring video quality inside a network

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Principles of Video Compression

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

A Study of Encoding and Decoding Techniques for Syndrome-Based Video Coding

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

Quality Assessment of the MPEG-4 Scalable Video CODEC

MPEG-1 and MPEG-2 Digital Video Coding Standards

SUBJECTIVE ASSESSMENT OF H.264/AVC VIDEO SEQUENCES TRANSMITTED OVER A NOISY CHANNEL

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

ABSTRACT 1. INTRODUCTION

Lecture 2 Video Formation and Representation

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

H.264/AVC. The emerging. standard. Ralf Schäfer, Thomas Wiegand and Heiko Schwarz Heinrich Hertz Institute, Berlin, Germany

An Analysis of MPEG Encoding Techniques on Picture Quality

Reduced complexity MPEG2 video post-processing for HD display

ENCODING OF PREDICTIVE ERROR FRAMES IN RATE SCALABLE VIDEO CODECS USING WAVELET SHRINKAGE. Eduardo Asbun, Paul Salama, and Edward J.

Conference object, Postprint version This version is available at

Video Quality Evaluation with Multiple Coding Artifacts

RECOMMENDATION ITU-R BT.1203 *

CHROMA CODING IN DISTRIBUTED VIDEO CODING

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ON THE USE OF REFERENCE MONITORS IN SUBJECTIVE TESTING FOR HDTV. Christian Keimel and Klaus Diepold

PERCEPTUAL QUALITY ASSESSMENT FOR VIDEO WATERMARKING. Stefan Winkler, Elisa Drelie Gelasca, Touradj Ebrahimi

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

Video Compression - From Concepts to the H.264/AVC Standard

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

ROBUST REGION-OF-INTEREST SCALABLE CODING WITH LEAKY PREDICTION IN H.264/AVC. Qian Chen, Li Song, Xiaokang Yang, Wenjun Zhang

Digital Video Engineering Professional Certification Competencies

ABSTRACT ERROR CONCEALMENT TECHNIQUES IN H.264/AVC, FOR VIDEO TRANSMISSION OVER WIRELESS NETWORK. Vineeth Shetty Kolkeri, M.S.

Transcription:

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS Yuanyi Xue, Yao Wang Department of Electrical and Computer Engineering Polytechnic Institute of NYU, Brooklyn, NY 11201, U.S.A Email: yxue01@students.poly.edu, yao@poly.edu ABSTRACT In this paper, the perceptual quality difference between scalable and single-layer videos coded at the same spatial, temporal and amplitude resolution (STAR) is investigated through a subjective test using a mobile platform. Three source videos are considered and for each source video single-layer and s- calable video are compared at 9 different STARs. We utilize paired comparison methods with and without tie option. Results collected from subjects in the without tie option and 6 subjects in the with tie option show that there is no significant quality difference between scalable and singlelayer video when coded at the same STAR. An analysis of variance (ANOVA) test is also performed to further confirm the finding. Index Terms Perceptual video quality, paired comparison, scalable video 1. INTRODUCTION Scalable video coding with spatial, temporal and amplitude scalability offers video servers and clients the flexibility in choosing appropriate video layers according to the network bandwidth and the user perference. Given a bandwidth constraint, the spatial resolution (controlled by frame size), temporal resolution (controlled by frame rate) and amplitude resolution (controlled by quantization parameter), can be adjusted such that the optimal perceptual quality can be achieved. However, scalable coding has not been widely adopted in commercial applications so far because of the complexity of scalable coding and the reduced coding efficiency compared to single layer coding. Most of the existing video streaming architectures uses multiple copies of single layer coded videos at different STAR s, and the system will send a version coded at a particular STAR based on the network condition. It is interesting and useful to see whether there are any quality differences between single-layer and scalable videos coded at the same STAR. In [1] [2], we have investigated the impact of STAR on the perceptual quality, and derived a model relating the perceptual quality with the STAR. It will be interesting to see whether the same model is also applicable to non-scalable video. In this work, we report results from subjective tests that compare the perceived quality between single-layer and scalable video, when coded at the same STAR combination. We design our subjective tests based on the paired comparison methods [3]. We conduct the test on a mobile platform with a 4.1-inch WVGA (854 480) touch screen running the Android OS. The remainder of this paper is organized as follows: Section 2 introduces the test interface, the test video pool and test methodology. Section 3 shows and analyzes the subjective test result. We conclude this work in Section 4. 2. TESTING INTERFACE AND METHODOLOGY 2.1. Testing interface The subjective tests are conducted on the TI s Zoom2 mobile development platform equipped with a 4.1-inch WVGA multi-touch screen. Our approach for designing the interface is using the Android s own video playback library (Android SDK), while using Java and XML to control the high-level program flow and UI. For details on the user interface design, please see [1, 2, 4]. 2.2. Sequence preparation Three videos, city, soccer, and foreman from the standard test video database 1 are used in the test. All videos are originally at 4CIF (704 576) spatial resolution with a frame rate of 30Hz, and each sequence is 8-second long (240 frames). A sine-windowed sinc function, which is the recommended downsampling filter in H.264/SVC standard [5], is used for generating videos at spatial resolutions of CIF and QCIF. The JSVM 9.18 [6] encoder is used to generate both single-layer and layered video. The GOP size is 16 frames in all cases. We choose this GOP size in order to make alignment with our other subject tests reported in earlier works. We investigate the effect of scalable coding in each dimension (i.e. spatial, temporal, or amplitude scalability) separate- 1 Available ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/

(a) City@4CIF/QP36/30Hz/53rd Frame (b) Foreman@CIF/QP28/30Hz/164th Frame (c) Soccer@4CIF/QP28/15Hz/75th Frame Fig. 1. Comparison snapshots for single-layer and scalable videos at same STAR. City scaled absolute difference map Foreman scaled absolute difference map Soccer scaled absolute difference map (a) MAD=1.9299 (b) MAD=0.9875 (c) MAD=1.26 Fig. 2. Scaled absolute difference maps for three videos ly, while fixing the resolutions of the other two dimensions at the highest. Specifically, to examine the effect of spatial scalability, we code all videos at highest temporal and amplitude resolution (FR=30Hz, QP=28). To create non-scalable video at different SRs, we code pre-downsampled input videos at QCIF, CIF, and 4CIF resolutions separately, using the JSVM encoder running at the single layer mode. To create scalable videos, we create a three-layer bitsteam using the JSVM encoder invoking only the spatial scalability, using QCIF as the base layer. For temporal scalability, we fix SR at 4CIF and QP at 28, and produce temporally scalable videos by using the JSVM encoder with the hierarchical B temporal prediction structure, with the base layer corresponding to 3.75 Hz, and additional enhancement layers leading to 7.5, 15, and 30 Hz, respectively. The non-scalable versions are created by coding the pre-downsampled input video at 7.5, 15, and 30 Hz video at the non-scalable mode using the I15BP structure, with only the first frame coded at I mode. Thus, the temporal scalable videos at lower frame rate have higher I/P-frame ratio then the corresponding single-layer videos. No QP cascading is used when temporal and spatial scalability is invoked. Finally, to test amplitude scalability (commonly known as quality or S- NR scalability), Coarse Gratitude Scalability (CGS) is used with base layer QP at 44, additional layers using QP at 36 and 28, respectively. For single-layer counterpart, we directly code the video at each QP (to be specific, QP at 28, 36 and 44 individually). Table 1 summarizes the test points examined in different cases. The coded bitstreams are then extracted and decoded into YUV format, and for CIF and QCIF streams, a 6-tap half-pel with bilinear quarter-pel interpolation filter [7] is used to upsample it to 4CIF for display in our mobile de- Table 1. Test points Common parameters Test parameters QP28, 30Hz 4CIF, CIF and QCIF 4CIF, 30Hz QP28, QP36 and QP44 QP28, 4CIF 30Hz, 15Hz and 7.5Hz vice. Finally, single layer and scalable layer videos coded at same STAR are catenated in both ways (single-layer first shown, and scalable-layer first shown) with a 3-second grey (R = G = B = 192) out interval in between. 2.3. Methodology To exam whether there is perceptual difference between single-layer and scalable video coded at the same STAR, the paired comparison method [3] is used. In paired comparison, a subject views two consecutive videos with a grey-out interval, and then is asked to rate which video is better in terms of perceived quality. There are two approaches in designing subjective tests using paired comparison: 2-forcedchoice without tie option and 3-forced-choice with tie option. In this work, we conduct our subjective tests using both methods. Please remind that for the 2-forced-choice without tie test, it is similar to the methodology used in the just noticeable difference or JND test. Here when we count the votes, we are using the 75% JND criteria. We use these two methods (without tie and with tie ) in order to provide a sense of cross validation on the votes subjects giving for each one. If there is no perceptual different for single layer and scalable videos, we should expect

Table 2. Votes for 2-forced choice without tie option tests city soccer foreman All videos Single Scalable Single Scalable Single Scalable Single Scalable 4CIF 13 7 8 12 8 12 29 31 CIF 8 12 14 6 12 8 34 26 QCIF 9 11 11 9 30 30 All S 30 30 32 28 39 93 87 30Hz 9 11 12 8 39 15Hz 13 7 9 11 11 9 33 27 7.5Hz 6 14 7 13 12 8 25 35 All T 28 32 28 32 33 27 89 91 QP28 11 9 39 QP36 8 12 13 7 12 8 33 27 QP44 5 15 11 9 14 6 30 30 All Q 23 37 35 25 36 24 94 86 the votes for without tie test would be more or less equal, while the votes for with tie will have a considerable amount of votes giving to the tie option. If we consider for any particular test the design flaw is a certain probability p (let s assume it s a small but non-trivial value), to have two tests having flaws would have significant lower probability (p 1 p 2 ) if they are intrinsically independent. The subject will view a catenated video from a randomly generated ordering (either single-layer first or scalable first), and after that depending on which test option, if it is (1) on the without tie test, he/she will choose which one (the first one or the second one) has a better quality even he/she couldn t decide; otherwise (2) on the with tie test, he/she will have the possibility to choose the tie option if he/she feels the perceived quality is the same for both. The subject can replay the current pair as many times as he/she wishes before rating. For each pair of videos in a particular STAR combination, two occurrences are shown, and the order of which one (single-layer or scalable) shown first is random and determined by the test interface. The design of double rating is to reduce the random choice bias. The subject will have to give the opinions on all test points for the session, the total number of test points is 27(3 3 3). Note that with double rating, each subject is viewing and rating 54 nineteen-second (a pair contains 2 eight-second PVSs with a 3-second interval in between) sequences. 3. RESULT AND ANALYSIS Ten subjects with normal vision participated the 2-forcedchoice test, 6 subjects with normal vision participated the 3- forced-choice test. The votes are counted for single-layer and for scalable videos for each test point, respectively. To provide a intuitive feeling of the PVS, in Fig. 1 we show a set of snapshots of encoded scalable and single-layer videos at the same STAR, and in Fig. 2, we show their corresponding absolute difference images, the difference images are scaled to display in order to show the difference more clearly. We can see each pair of videos perceptually look very similar, although there are non-zero pixel differences. Table 2 provides the counting result for the 2-forcedchoice test. As we mentioned in Section 2.3, the 2-forcedchoice test can be seen as a special case of JND test. If the hypothesis that there is a just noticeable difference on the perceived quality is accepted, the winning frequency for the better quality one should be at least above 75% under the 75% JND condition, that is at least 15 votes for a particular video at a particular STAR combination, since each video pair is viewed 20 times. From Table 2, except city at QP44/30Hz/4CIF, there is no such occurrence. Thus it s safe to say that there is no significant difference in the perceptual quality between the scalable and single-layer video at all STAR s examined. To further examine the statistical signif- Table 4. p-value and f -value of ANOVA test for without tie test p-value f -value Spatial 0.5458 0.38 Temporal 0.5549 0.36 Amplitude 0.4946 0.49 icance of the rating differences, we conducted an one-way ANOVA analysis in the three dimensions (spatial, temporal and amplitude resolutions) separately and the results are shown in Table 4. For each of the ANOVA analysis, we want to test the null hypothesis that the votes giving to the single layer videos and to the scalable videos are drawn from the populations with same means. For all the cases, the p-values are larger than 0.05, indicating that there are no significant differences between videos coded in single-layer and scalable modes, the different coding schemes are not a factor to determine the votes.. We also show the box plots of the ANOVA

Table 3. Votes for 3-forced choice with tie option tests city soccer foreman All videos Single Scalable Tie Single Scalable Tie Single Scalable Tie Single Scalable Tie 4CIF 1 1 0 2 1 3 8 2 6 28 CIF 9 2 1 9 0 2 3 5 28 QCIF 2 2 8 1 0 11 0 0 12 3 2 31 All S 4 5 27 3 3 30 1 5 30 8 13 87 30Hz 2 2 8 1 3 8 2 1 9 5 6 25 15Hz 3 1 8 2 3 7 3 2 7 8 6 22 7.5Hz 2 2 8 1 1 2 3 7 5 6 25 All T 7 5 24 4 7 25 7 6 23 18 18 72 QP28 9 2 3 7 2 2 8 5 7 24 QP36 2 2 8 1 3 8 0 1 11 3 6 27 QP44 1 1 2 2 8 1 0 11 4 3 29 All Q 4 5 27 5 8 23 3 3 30 12 16 80 14 12 8 6 14 12 ANOVA for spatial resolution ANOVA for temporal resolution tests in Fig. 3. In the box plots, the central red mark is the median of the data, the notches in the box represent the 95% confidence interval of the median, the edges of the box are the 25th and 75th percentiles and the whiskers extend to the most extreme data points. We find that the 95% confidence interval of medians are overlapped, indicating there is no perceived quality difference between single-layer and scalable coded videos. Retrospectively this also indicates although we have limited number of subjects, there decision is coherent and thus we think the number of subjects is sufficient for answering our question. Table 3 shows the counting result for the 3-forced-choice test. We see that in most cases, the majority of votes are given to the tie option, indicating the viewers could not tell the difference between the single-layer and scalable coded video at the same STAR. 8 6 15 5 ANOVA for amplitude resolution Fig. 3. Box plots for the ANOVA tests, in x-axis, 1 indicates single layer video and 2 indicates scalable video. 4. CONCLUSION This paper reports results from a perceptual quality assessment comparing single-layer video and scalable video, when coded at the same spatial, temporal and amplitude resolutions (STAR). The subjective test was conducted using paired comparison with and without tie option and double rating. Ten subjects data were collected for the without tie option, and 6 subjects ratings for the with tie option. The test result shows that under the same STAR there is no significant perceptual quality difference between single layer coded video and scalable one, both by observing the ratings and through using the ANOVA test. Although the single-layer and scalable videos are generated using the H.264/AVC and H.264/SVC compliant codecs, respectively (both implemented via the JSVM encoder under different settings), we believe the conclusion may be generally true for any videos coded at the same STAR, regardless the encoding method. Note here we measure the amplitude resolution by the inverse of

quantization stepsize. We consider the two videos as having the same amplitude resolution if they are quantized using the same type of quantizer and at the same quantization stepsize. One important consequence of our finding here is that the Q- STAR model developed in our prior work [2] modeling the perceptual quality as a function of STAR is applicable to both scalable and non-scalable video. 5. REFERENCES [1] Yuanyi Xue, Yen-Fu Ou, Zhan Ma, and Yao Wang, Perceptual Video Quality Assessment On A Mobile Platform Considering Both Spatial Resolution And Quantization Artifacts, in Proc. of PacketVideo, Dec. 20. [2] Yen-Fu Ou, Yuanyi Xue, Zhan Ma, and Yao Wang, A Perceptual Video Quality Model for Mobile Platform Considering Impact of Spatial, Temporal, and Amplitude Resolutions, in th IEEE IVMSP Workshop on Perception and Visual Signal Analysis, Jun. 2011. [3] Bradley R. A. and Terry M. E., Rank analysis of incomplete block designs, I. the method of paired comparisons., Biometrika, vol. 39, pp. 324 345, 1952. [4] Yuanyi Xue, Perceptual Quality Assessment of H.264/SVC on Spatial Resolution And Quantization, Master Thesis, Polytechnic Institute of NYU, June 20. [5] G. Sullivan and S. Sun, AHG Report on Spatial Scalability Resampling, Joint Video Team of, ISO/IEC MPEG & ITU-T VCEG, Document: JVT-Q007, Oct. 2005. [6] Joint Scalable Video Model, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-X202, Jul. 2007. [7] E. Francois and et al., Generic Extended Spatial Scalability, Oct. 2004.