PERCEPTUAL VIDEO QUALITY ASSESSMENT ON A MOBILE PLATFORM CONSIDERING BOTH SPATIAL RESOLUTION AND QUANTIZATION ARTIFACTS

Similar documents
PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

Project Proposal: Sub pixel motion estimation for side information generation in Wyner- Ziv decoder.

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Adaptive Key Frame Selection for Efficient Video Coding

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

UC San Diego UC San Diego Previously Published Works

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Chapter 2 Introduction to

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

ERROR CONCEALMENT TECHNIQUES IN H.264 VIDEO TRANSMISSION OVER WIRELESS NETWORKS

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Improved Error Concealment Using Scene Information

ON THE USE OF REFERENCE MONITORS IN SUBJECTIVE TESTING FOR HDTV. Christian Keimel and Klaus Diepold

SCALABLE video coding (SVC) is currently being developed

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

Lecture 2 Video Formation and Representation

Constant Bit Rate for Video Streaming Over Packet Switching Networks

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Overview: Video Coding Standards

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

WITH the rapid development of high-fidelity video services

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

RECOMMENDATION ITU-R BT Methodology for the subjective assessment of video quality in multimedia applications

ABSTRACT 1. INTRODUCTION

Reduced complexity MPEG2 video post-processing for HD display

Video Over Mobile Networks

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

A low-power portable H.264/AVC decoder using elastic pipeline

An Overview of Video Coding Algorithms

WE CONSIDER an enhancement technique for degraded

Error Resilient Video Coding Using Unequally Protected Key Pictures

SERIES J: CABLE NETWORKS AND TRANSMISSION OF TELEVISION, SOUND PROGRAMME AND OTHER MULTIMEDIA SIGNALS Measurement of the quality of service

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter?

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

Understanding Compression Technologies for HD and Megapixel Surveillance

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

P SNR r,f -MOS r : An Easy-To-Compute Multiuser

Implementation of an MPEG Codec on the Tilera TM 64 Processor

HEVC Subjective Video Quality Test Results

Express Letters. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation

an organization for standardization in the

SUBJECTIVE ASSESSMENT OF H.264/AVC VIDEO SEQUENCES TRANSMITTED OVER A NOISY CHANNEL

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Video Quality Evaluation with Multiple Coding Artifacts

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICASSP.2016.

Error concealment techniques in H.264 video transmission over wireless networks

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

Interlace and De-interlace Application on Video

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Popularity-Aware Rate Allocation in Multi-View Video

Objective video quality measurement techniques for broadcasting applications using HDTV in the presence of a reduced reference signal

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Visual Communication at Limited Colour Display Capability

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

Principles of Video Compression

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Monitoring video quality inside a network

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Error Concealment for SNR Scalable Video Coding

Understanding PQR, DMOS, and PSNR Measurements

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

SCENE CHANGE ADAPTATION FOR SCALABLE VIDEO CODING

OBJECTIVE VIDEO QUALITY METRICS: A PERFORMANCE ANALYSIS

TR 038 SUBJECTIVE EVALUATION OF HYBRID LOG GAMMA (HLG) FOR HDR AND SDR DISTRIBUTION

Video coding standards

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

Content storage architectures

Lund, Sweden, 5 Mid Sweden University, Sundsvall, Sweden

High Efficiency Video coding Master Class. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Bit Rate Control for Video Transmission Over Wireless Networks

Analysis of MPEG-2 Video Streams

DCI Requirements Image - Dynamics

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

The H.26L Video Coding Project

MPEG has been established as an international standard

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

H.264/AVC Baseline Profile Decoder Complexity Analysis

Evaluation of video quality metrics on transmission distortions in H.264 coded video

Chapter 10 Basic Video Compression Techniques

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

MPEG-2. ISO/IEC (or ITU-T H.262)

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion

A SUBJECTIVE STUDY OF THE INFLUENCE OF COLOR INFORMATION ON VISUAL QUALITY ASSESSMENT OF HIGH RESOLUTION PICTURES

High Quality Digital Video Processing: Technology and Methods

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Key Techniques of Bit Rate Reduction for H.264 Streams

Transcription:

Proceedings of IEEE th International Packet Video Workshop December 3-,, Hong Kong PERCEPTUAL VIDEO QUALITY ASSESSMENT ON A MOBILE PLATFORM CONSIDERING BOTH SPATIAL RESOLUTION AND QUANTIZATION ARTIFACTS Yuanyi Xue, Yen-Fu Ou, Zhan Ma, Yao Wang Department of Electrical and Computer Engineering Polytechnic Institute of NYU, Brooklyn, NY, U.S.A Email: {yxue, you, zma3}@students.poly.edu, yao@poly.edu ABSTRACT In this paper, we investigated the impact of spatial resolution and quantization on the perceptual quality of a compressed video coded from the SVC reference software JSVM. Two subjective quality tests have been carried out on the TI Zoom mobile development platform (MDP). 5 different spatial resolutions (i.e., from Q to ) and different quantization levels (i.e., QP,, 3, ) are chosen to create the processed video sequences (PVS) from 7 different video sources. First test focuses on the effect of spatial resolution at fixed QP, while second test explores the impact of both spatial resolution and quantization artifacts. Videos coded at different spatial resolutions are displayed at the full screen size of the mobile platform. We report the test results in terms of how reduces as the spatial resolution decreases at constant QP, and how reduces as QP increases under the same spatial resolution. We found that the dropping rate of the vs. spatial resolution increases as QP increases and the dropping rate of the vs. QP increases as spatial resolution reduces. In addition, the dropping rate against spatial resolution is faster when the video is changing in a lower motion speed; while the dropping rate against QP is faster when the video has less texture details. Also, the dropping rate of against spatial resolution can be captured by a one-parameter falling exponential function in most of cases. Index Terms Perceptual quality, spatial resolution, quantization, scalable video. INTORDUCTION For a video bitstream, there are three primary parameters which control the bandwidth requirement: quantization stepsize (amplitude resolution), frame rate (temporal resolution) and frame size (spatial resolution). Given the bandwidth limit of a receiver, the encoder or a network transcoder has to decide at which spatial, temporal, and amplitude resolution (STAR) to code or transcode a video, to provide the best perceptual quality. Therefore, it is important to understand the impact of the STAR on the perceptual quality. However, studying the joint impact of all three dimensions (spatial, temporal, and amplitude resolutions) on the perceptual quality is a complex and challenging task, because of the high dimensionality of the problem. In our previous work [] [], we have considered the joint impact of temporal and amplitude resolutions, under a fixed spatial resolution (). In this paper, we focus on the interaction of the spatial and amplitude resolutions, under a fixed temporal resolution (3Hz). The works in [3] and [] conduct the perceptual quality experiments on the impact of spatial resolution and quantization artifacts. However, the quality assessment in [3] only includes 3 different spatial resolutions and was not carried out on mobile devices, while [] only considers the range of spatial resolution between Q to. In the state-of-art smartphones, they are often equipped with a screen with spatial resolutions beyond. So in this paper, we cover from Q to spatial resolution range, along with different QPs, to examine the impact on the perceptual quality. In our previous works, the perceptual video quality is evaluated on larger screen size, however, we believe that the form factor and the display screen may affect the watching experiences. It is important to use a display environment similar to the actual mobile video users during the subjective test. In this work, we conduct the test on a mobile platform (Zoom from TI) with a screen size of.-inch at a resolution of WVGA (5x). The following sections of this paper are as follows: in Section introduces mobile platform and subjective test interface; in Section 3, we give the test protocol and the methodology for post processing of the data collected; in Section, we present and analyze the results of subjective tests. We conclude our work in Section 5.. MOBILE DEVELOPMENT PLATFORM AND SUBJECTIVE RATING INTERFACE.. Mobile Development Platform Targeting for wireless mobile applications, we choose TI Zoom mobile development platform (MDP) [5] to perform our subjective tests. This Zoom MDP runs on powerful TI OMAP3x dual-core processor which enables the real- 97---95-//$. IEEE PV

time H./AVC decoding and is widely deployed in popular commercial SmartPhones, such as Palm Pre and Nokia N9. Besides, Zoom MDP provides a.-inch WVGA (5-by- ) touch screen which is similar to the popular mobile devices in the market... Operation System and Subjective Rating Interface Google Android [] mobile operating system (OS) is selected as our MDP OS because of its popularity and open-source. Android is developed based on the GNU/Linux kernel, and packed using Java to provide user interface. We can easily write high-level Java code to command the low-level process, for example, video playback. We can also develop our own video codec library and plug into the kernel for high-level applications. In our current design, we develop the highlevel subjective rating interface, and use the default video codec library provided by the Android system to do real-time H./AVC compliant video playback. Fig. illustrates subjective rating interface on our Zoom MDP. A welcome screen is shown to every new viewer at the beginning of each test to record his/her basic information, such as name, age, gender, expert or non-expert on video quality rating, etc. Then, a randomly selected processed video sequence (PVS) is rendered to the screen for subject viewing. It usually takes seconds for one PVS playback. A rating screen prompts just after the PVS playback for recording the subjective rating. We implement -level rating protocol [7] as shown in Fig. (c). We assume that a button below the first one would indicate a totally useless video, which no viewer would choose. So the effective rating scale is points, as recommended by ITU [7], with scale corresponding to useless. Please refer to Section 3.3 for further explanation. In order to remove rating noise as much as possible, we allow the subject to replay the PVS just watched if he/she doesn t feel confident to give a proper judgement. (a) Welcome Screen (b) Video Playback Screen 3. PERCEPTUAL QUALITY ASSESSMENT AND DATA POST-PROCESSING In this section, we first describe the test video pool, and then explain the details of subjective rating protocol applied in our experiments. We further present the data post-processing method to screen the raw data from subjective rating so as to reduce the rating noise. 3.. Test Video Pool (c) Rating Screen Fig.. Screenshots of the subjective rating interface on TI Zoom MDP. Seven different videos (5 at (7 57) and at VGA ( ) original resolution), are chosen in our experiment to perform subjective tests. Three other sequences, i.e., In To Tree, Shields and Football, are used as training sequences. Two of them, In To Tree, Shields, are cropped from original 7p high-definition (HD) source to fit our Zoom MDP

SI SI/TI Flowergarden Football Shields Intotree TI Fig. 3. SI and TI measures for test video pool screen and Football is in VGA. All the test sequences and training sequences are shown in Fig.. Low-resolution (i.e.,, Q) videos are downsampled using the Sine-waved Sinc function recommended in the SVC reference software JSVM [, 9]. These videos are selected from the standard video pool to include various content activities, such as high motion, low motion, rich texture, camera panning, etc. One can refer Fig. 3 for SI and TI measures []. Besides, these videos are widely used for evaluating the performance of the video compression algorithms. In order to verify whether the VGA sequences are visually equivalent to the sequences, we performed a pretest involving viewers. We first create a aspect ratio (:9) sequence by cropping and interpolating from the VGA sequence, and then upsample it back to, called VGAderived. The pre-test compares 5 pairs of videos, each pair contains the original version, and the VGA-derived version. All the videos are re-interpolated to a size of 5x, because the Zoom s screen s vertical resolution is. Specifically, for each pair, we not only ask the viewer to give a rating for each sequence, but also ask whether or not it can tell the difference between the two. In Table, we normalized the ratings for VGA-derived sequences by the rating for native sequences. The results along with the YES ( Y ) or NO ( N ) question responses clearly show that there is no noticeable differences between VGA-derived and resolution for the same video content. SVC reference software JSVM9 [] is chosen to generate spatial scalable bitstreams. The input videos are encoded at 3 frame per second (fps) with -frame per GOP (Group-of-Picture). Only the first frame is intra coded. Except the sequence which is 5 frames, all others are frames (i.e., 3 -frame GOPs plus first intra frame). Thus, each PVS is about seconds when using 3 Hz playback frame rate. Finally, All the coded sequences are interpolated to spatial resolution. This is done using the AVC -tap Halfpel Table. Results of the test comparing and VGAderived sequences, the first rating (VGA-derived ) is divided by the rating of reference, showing as. in the second (Native ). The last row shows the average of ratio and number of Yes for each sequence. Viewer A./. N./. Y.5/. Y./. N B./. Y.75/. N.5/. N./. N C./. Y./. N.5/. N.5/. N D.5/. N.5/. Y./. N./. N E.75/. Y./. N./. N.75/. N F./. N./. Y.5/. N./. N avg../. Y:3./. Y:3./. Y:./. Y: with Bilinear Quaterpel interpolation filter, which is developed for SVC []. Zoom will automatically resize those sequences to a spatial resolution with height equals in tests. 3.. Test Protocol Two separate experiments are carried out, one focuses on the perceptual impact of spatial resolution, and the other focuses on the impact of both spatial resolution and quantization artifacts. Except the common Q (7 ), (35 ) and (7 57) resolutions, two other intermediate resolutions, i.e., 5 and 5 3, are also included only in Test. To reduce the experimental complexity of the second test, we consider only three different levels for both spatial resolution (i.e., Q,, and ) and quantization (i.e., QP, 3, ). In order to facilitate the mapping of the test results between Test and Test, we also include selected sequences from Test in Test, with QP=. Table lists the spatial resolution and quantization configurations for Test and Test respectively. Overall, each video content has different PVSs, spanning over 5 different spatial resolutions andqps. Table. Spatial resolution and quantization parameter (QP) used in our tests Spatial Resolution QP 7x (Q), 5x Test 35x (), 5x3 7x57 () Test Q,,,, 3, We use Single Stimulus or SS for both tests, which is recommended by [7] as the presentation method for video assessment tests. We use a - discrete rating scale without reference sequence (as shown in Fig. ). Although the users are presented only scales, the test is equivalent to the We use VGA resolution for Flowergarden and sequences since we don t have source for them. 3

(a) @ (b) @ (c) @ (d) @ (e) @ (f) FlowerGarden@VGA (g) @VGA (h) Shields@ (i) InToTree@ (j) Football@VGA Fig.. Test video pool for subjective tests. Table 3. Average STD of each sequence with and without grouping Seq. Average STD Seq. Average STD w.9 wo. FG w 9.35 FG wo 7.3 w.39 wo 7.73 scales recommended by ITU [], with scale corresponding to useless. We didn t use grouping in either test. The phrase grouping indicates a presentation method that groups all quality variations of one test sequence in a successive order. We ran a series of pre-tests to verify whether we could benefit from grouping. We chose, and FlowerGarden (FG for short) as testing sequences, in three spatial resolutions (, and Q) and three different QPs (QP, 3 and ). Thus the total variations we have for each sequence is nine. SS method with - rating scale and no reference are used, while the presentation of the sequences is in pseudo-random order. After scale the rating to -, the standard deviations of the user ratings under the grouping and no-grouping cases are calculated and given in Table 3. It can be seen that there is no consistent difference between these two cases. Therefore, we chose not to use grouping in the subjective tests. 3... Test : Impact of spatial resolution In Test, we include all five spatial resolution variations for each sequence. The training session contains two sequences, In To Tree and Shields, with all five variations of spatial resolution. In training session, each viewer is told the level of quality (highest/nd/3rd/th/lowest) of each sequence shown, so that he or she gets a sense of quality range to be expected. In the test session, each viewer is asked to give an overall rating of each sequence after viewing them. The order of the test presentations is random. The total playing time for all video presentations in Test is 3 seconds or minutes. The actual test time is between - minutes for each viewer. It is in the range of recommendations [7]. 3... Test : Impact of both spatial resolution and quantization In Test, we examine three spatial resolutions and three QPs. They are, and Q; QP, QP3 and QP. Eight PVSs in Test (with QP) are also included as common sequences between Test and Test. Before the test session of Test, there is a similar training session which allows users to get familiar with the interface and the quality range they are going to see. In Test five sequences are chosen such that they cover highest and lowest quality and some intermediate levels, listed in Table. Table. Training session sequences in Test Seq. Condition Seq. Condition In To Tree QP at Shields QP3 at Football QP at Shields QP at Q Football QP at Q A common set between Test and Test is established for mapping Test to Test to cover four QPs and five spatial resolutions. We choose the common set such that they cover the whole range of spatial resolutions. In particular, at and Q, at, at, at, at, at Q and FlowerGarden at Q, all with QP are included in Test. Mapping procedure is introduced in Section 3.3.3.

The number of all test presentations in Test is 7, the total playing length is around minutes. If we were to include all these video into a single viewing session, each test would be too long. Therefore, we divide the whole session into 7 subsessions each containing sequences with overlapping sequences between subsessions. Each subsession includes all 9 versions of each of 3 different videos plus the common sequences, with 5 sequences as training sessions, leading to a -sequences subsession instead of 7, with an actual test time about minutes. Every viewer would view and rate one subsession. 3.3. Data Processing 3.3.. Data Collection In both tests, we have viewers for each PVS, evenly distributed male and female viewers. All viewers have normal visual and color perception. About % of viewers are nonexpert with no related background in video processing. After obtaining the raw ratings, an averaging post screening pass illustrated in Section 3.3. is performed, then we calculate the mean and standard deviation of all the scores by each viewer. Based on these two measures, we convert raw ratings to Z-scores by applying the following equation []: Z ki j = X ki j MEAN(X i ) () STD(X i ) Here, X ki j and Z ki j denote the raw rating and Z-score of k th sequence at j th variation, from i th viewer, respectively. X i denotes all ratings from i th viewer. MEAN and STD represent average number and standard deviation of a given set, respectively. After that, the Z-score will go through BT.5. post screening method shown in Section 3.3.. Finally, they are scaled back to [ ] by using the following equation: X ki j,scl = (MED(X max ) MED(X min )) Z ki j Z min,i Z max,i Z min,i + MED(X min ) () Here, Z ki j denotes Z-score of k th sequence at j th variation, from i th viewer. X ki j,scl denotes the scaled Z-score of k th sequence at j th variation, from i th viewer. X max and X min denote maximum and minimum value set for all viewers ratings, respectively. Z max,i and Z min,i denote the maximum and minimum Z-score from i th viewer. MED represents median operation of a given set. Finally, we average the scaled Z-scores among all viewers for each PVS to obtain its mean opinion score (). 3.3.. Post Screening It is well known that subjective test is prone to noise since lots of conditions will change viewers decision on rating a test sequences. Here we perform two post screening methods for the data collected. The BT.5. pass is the same as the ITU recommendation in [7]. We set β as recommended by [7], P and Q matrices were calculated for each video content. The outlier is the viewer whose ratings exceed the threshold twice or more, and then we discard all the ratings respect to this video sequence of that viewer. We make use of the fact that a video coded at a lower spatial resolution or higher QP should not be given a rating higher than a video coded at a higher spatial resolution or lower QP, if the viewer was consistent, to make the averaging pass. In particular, we calculate the ratio between every two test video with adjacent spatial resolution or QP. If the ratio is greater than a threshold T (T =.), we count it as an outlier. For each source video and each viewer, we count the number of outliers. If the number equals or exceeds an another threshold N outlier (N outlier = 3 in our case), we remove all ratings of this viewer for this source video. For the remaining pair of ratings with a ratio less than the threshold T but >, we average the ratings of this pair for this viewer. For example, if a viewer gives a rating of and 5 for two test videos, at QP3 and QP respectively, both at the same spatial resolution of, we will change both ratings to.5. 3.3.3. Datasets Combining We use the common set between Test and Test to form a linear mapping function, to map the ratings from Test to Test scale using the method recommended in [3]. The mapping function is derived by minimizing the sum of square error of prediction and is shown in Fig.. After mapping the scores from Test to Test, we calculate the for each common sequence by a weighted average of the mapped from Test and the from Test, where the weights is proportional to the number of viewers remaining after postscreening in each test.. RESULTS We first plot our data vs. spatial resolution, to give a flavor of general quality decay trend. From Fig. 5 we can find, all seven sequences show similar quality decay trend as the spatial resolution decreases for all QPs. In addition, the quality ratings at QP and QP are very similar, for most sequences. In order to observe the quality degradation as spatial resolution (SP) decreases when the QP is fixed, we normalize the by the at highest spatial resolution (SP) under the same QP, denoted as normalized v.s. spatial resolution (SP), or. Fig. 5 shows the measured vs SP at different QPs, for different sequences. The decay of with SP seems to follow a falling exponential trend. Based on this observation, we propose to use a one-parameter 5

3 z score (test) common seq fitting z score (test) Fig.. The Z-score scatter plot and mapping curve of common sequences in Test and Test by using y = ax + b, a =., b =.. falling exponential model, s smax (s) = e b (3) e b to capture the dropping trend. Here s is the number of pixels, and s max is the highest SP (here we set to, 7 57 or 55 pixels). b is the model parameter. The curves in first 7 subplots of Fig. are predicted using the proposed model. Note that the model parameter b reflects the dropping rate, with smaller value corresponding to a faster dropping rate. As can be seen, the model fits most data quite well. But overall, the model seems to over estimate the at higher SP (especially the second highest SP) and underestimate at lower SP. We are currently investigating alternative model functions and comparing their accuracy. In Fig. each of the first 7 subplots contains the corresponding to different QPs, for the same video sequence. We can see that the quality drops faster under larger QP. This may be due to the fact that larger QP introduces more visible blurring artifacts compared with lower QP under the same SP. Note that for most sequences, the drop rate is very similar under QP and QP, but for some sequences, like and FlowerGarden, results at QP are more close to at QP3. This may be due to the fact that these two sequences have very fine details, and hence are more susceptible to blurring introduced by quantization. In the last subplot of Fig., we plot the of all 7 source videos at QP. This is to reveal the impact of video content on the dropping trend. It can be seen that, under the same QP, and drops faster while and drop slower. Both and have relatively smaller motion speed while and are larger. So given a fixed QP, the data suggest that the human eyes are QP= QP= Qp=3 Qp= 3 x 5 QP= QP= Qp=3 Qp= 3 x 5 QP= QP= Qp=3 Qp= 3 x 5 QP= QP= Qp=3 Qp= 3 x 5 QP= QP= QP= QP= Qp=3 Qp=3 Qp= Qp= 3 x 5 3 x 5 QP= QP= Qp=3 Qp= 3 x 5 Fig. 5. vs. spatial resolution (SP) at different QPs

.... QP:b:7.. QP:b:.53 QP3:b:. QP:b:.3 3 x 5.. QP:b:.. QP:b:.9 QP3:b:5.9 QP:b:3.9 3 x 5.... Q. 5.... Q. 5.... QP:b:7.9. QP:b:7. QP3:b:.7 QP:b:.5 3 x 5.. QP:b:7.5. QP:b:.73 QP3:b:5. QP:b:3. 3 x 5.... Q. 5.... Q. 5.... QP:b:.9. QP:b:.3 QP3:b:.77 QP:b:.7 3 x 5.. QP:b:.3. QP:b:7.5 QP3:b:.93 QP:b:. 3 x 5 QP.... Q. 5.... Q. 5 at /3Hz... QP:b:.5. QP:b:.5 QP3:b:.75 QP:b:.5 3 x 5.... 3 x 5 Fig.. Normalized vs. spatial resolution (SP). Points are the measured data, curves represent the predicted data, using the model in Eq.3. The model parameter b is derived using least squares fitting. The first 7 subplots compare under different QPs, for the same video source. The last subplot compares for different video sources under the same QP (=)..... Q. 5.... 5 Fig. 7. Normalized vs. quantization stepsize (QS). The first 7 subplots compare under different spatial resolutions, for the same video source. The last subplot compares for different video sources under the same spatial resolution (). 7

more sensitive to SP reduction on videos with low motion, and vice versa. This is consistent with the well-known spatiotemporal masking properties of the human visual system. The impact of video contents on the dropping rate of at other QPs is similar. Similarly to better understand the quality degradation among different sequences as the quantization stepsizes (QS) increases, we normalize the by the at lowest QS under the same spatial resolution (SP), denoted as normalized v.s. QS, or. We map QP into QS using the function q step = QP as defined in H./AVC codec. In Fig. 7 each of the first 7 subplots compares the under different spatial resolutions for one video content. We can see that for most sequences, the drops faster with QS, when the SP is lower. This suggests that the human eye is more sensitive to quantization noise when the SP is lower. To investigate the impact of video content on the dropping rate of, the last subplot of Fig. 7 compares the of all video contents under the same SP (). We can see that and have the lowest while and FlowerGarden are mostly on the top. So given a fixed spatial resolution, human eyes are more sensitive to quantization induced distortion when there are less texture details. Similar trend can also be observed under resolution, whereas the trend under Q is inconsistent. 5. CONCLUSION In this work, we performed a large amount of subjective tests regarding the impacts of spatial resolution and quantization on the perceptual video quality. The subjective tests are performed on the mobile platform to mimic the real application environment. We have found that the dropping rate of the with the spatial resolution depends on the quantization, with a faster drop associated with a higher QP. This dropping rate also depends on video content, e.g. sequences with low motion speed have a faster drop rate. Similarly, the dropping rate of the with the quantization increases when the spatial resolution decreases. The dropping rate of the vs. quantization stepsize increases when there are less texture details. We also observed that the curves of can be modeled quite well by an inverted exponential function, where the model parameters depend on QP and video content. The model result shows a consistent dependency on model parameter and quantization, but for some sequences or at some QP, the exponential function cannot model the decay trend well. For, the trend is not very consistent. At Q, exponential decay appears to be a good model, but at and, a sigmoidal function may be more appropriate. We are currently exploring better analytical models to represent the and.. REFERENCES [] Y.-F. Ou, Z. Ma, and Y. Wang, Perceptual quality assessment of video considering both frame rate and quantization artifacts, to appear, IEEE CSVT,. [] Y. Wang, Z. Ma, and Y.-F. Ou, Modeling rate and perceptual quality of scalable video as functions of quantization and frame rate and its application in scalable video adaptation, in Packet Video Workshop, 9, pp. 9. [3] D. Wang, F. Speranza, A. Vincent, T. Martin, and P. Blanchfield, Towards optimal rate control: A study of the impact of spatial resolution, frame rate, and quantization on subjective video quality and bit rate, vol. 55. Proc. of SPIE VCIP, 3, pp. 9 9. [] C. S. Kim, S. Dongjun, T. M. Bae, and Y. M. Ro, Measuring video quality on full scalability of h./avc scalable video coding, IEICE Tran. on Communications, vol. E9-B, no. 5, pp. 9 75, May. [5] TI Zoom MDP. [Online]. Available: http://www. omapzoom.org/platform.html [] Google Android. [Online]. Available: http://www. android.com/ [7] Rec. ITU-R BT.5-, Methodology for the subjective assessment of the quality of television pictures,. [] Joint Scalable Video Model, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-X, Jul. 7. [9] S. Sun and J. Reichel, Ahg report on spatial scalability resampling, Jan. [] Rec. ITU-T P.9, Subjective video quality assessment methods for multimedia applications,. [] E. Francois and et al., Generic extended spatial scalability, Oct.. [] A. M. van Dijk, J. B. Martens, and A. B. Watson, Quality assessment of coded images using numerical category scaling, in Proc. SPIE, vol. 5, 995, p. 9. [3] M. H. Pinson and S. Wolf, Techniques for evaluating objective video quality models using overlapping subjective data sets, NITA, Tech. Rep. TR-9-57, Nov.. This is due to the downsampling followed by upsampling. See Section 3..