Margaret H. Pinson mpinson@its.bldrdoc.gov
Introductions Institute for Telecommunication Sciences U.S. Department of Commerce Technology transfer Impartial Basic research Margaret H. Pinson Video quality research Standards leadership Over 20 years 4/25/2012 1
What We Do Video quality Measure quality of transmitted moving video Focus on transmission Various transmission and display Entertainment and video conferencing Subjective testing Carefully designed experiment Ask people their opinion Expensive Time consuming 4/25/2012 2
Topics Why objective models are needed ITS objective video quality models Truth data subjective tests ITS calibration algorithms ITS objective video quality models Video Quality Experts Group (VQEG) Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 3
Metrics for Analog Video Test Cards Static patterns Adjust television and camera RCA Indian Head test pattern SMPTE Color Bars 4/25/2012 4
Metrics for Digital Video Quality changes Variable delay Scene matters Network impairments Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder 4/25/2012 5
Types of Metrics Full reference (FR) Reduced reference (RR) No reference (NR) Bit-stream Hybrid perceptual / bit-stream 4/25/2012 6
Types of Metrics Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Full Reference (FR) Reduced Reference (RR) 4/25/2012 7
Types of Metrics Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Bit-stream Hybrid Perceptual Bit-Stream Full Reference (FR) No Reference (NR) Reduced Reference (RR) 4/25/2012 8
ITS Approach to Video Metrics Reduced Reference (RR) Need access to original video Bandwidth between input / output Side channel carried with the video Separate channel Reasons Accuracy In-service 4/25/2012 9
Truth data Steps to an Objective Video Quality Model Calibrate impaired video Model perception Validation 4/25/2012 10
Truth Data 1992 to 1999 1. 42 clips 2. 105 clips 3. 112 clips 4. 90 clips 5. 90 clips 6. 90 clips 7. 90 clips 8. 600 clips 9. 132 clips 10. 164 clips 11. 48 clips 11 Subjective Video Quality Tests Standard definition (NTSC) CIF Variety of systems 128 kb/s to 45 Mb/s Network errors Digital codecs Analog noise 9 to 10 second clips Train one model on 11 datasets? 4/25/2012 11
Why Subjective Datasets Cannot Be Directly Compared Raw Data Fitted Data M. Pinson and S. Wolf, An Objective Method for Combining Multiple Subjective Data Sets, SPIE Video Communications and Image Processing Conference, Lugano, Switzerland, July 2003. 4/25/2012 12
Iterated Nested Least-Squares Algorithm (INLSA) Combine multiple subjective datasets Pick objective video quality metrics Accurate enough to be meaningful Several metrics to reduce error INLSA finds mapping Linear fit Single subjective scale S. D. Voran, An iterated nested least-squares algorithm for fitting multiple data sets, NTIA Technical Memorandum TM-03-397, Oct. 2002. 4/25/2012 13
Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation 4/25/2012 14
ITS Calibration Algorithms Full reference calibration Reduced reference calibration PSNR exhaustive search 4/25/2012 15
Calibration Time alignment Spatial scaling Spatial shift Valid region SD, HDTV, 3DTV Border not seen Luma gain & offset Color gain & offset 4/25/2012 16
Full Reference Calibration Design goals Accurate Fast Validated and Standardized ANSI T1.801.03-2003 ITU-T Rec. J.144 ITU-R Rec. BT.1683 Fully disclosed S. Wolf and M. Pinson, Video quality measurement techniques, NTIA Report 02-392, June 2002. M. Pinson and S. Wolf, A new standardized method for objectively measuring video quality, IEEE Transactions on Broadcasting, September 2004. 4/25/2012 17
Reduced Reference Calibration Design goals Run in-service with data channel Similar run time Slightly less accurate but not by much 22 kb/s for standard definition 525-line Spatial scaling Validated and standardized ITU-T Rec. J.244 Fully disclosed M. H. Pinson and S. Wolf, Reduced reference calibration algorithms, NTIA Technical Report TR- 06-433, October 2005. 4/25/2012 18
PSNR Exhaustive Search Calibration Design goal Very accurate Maximize Peak Signal-to-Noise Ratio (PSNR) Exhaustive search of spatial & temporal shifts Ideal luma gain/offset Painfully slow Limits search range Baseline PSNR measurement ITU-T Recommendation Rec. J.340 (2010) 4/25/2012 19
Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation 4/25/2012 20
Perceptual Video Quality Models Objective model Predict human perception Judgment Flexible New scenes New video coders Network errors 4/25/2012 21
Objective Models Peak signal-to-noise ratio (PSNR) NTIA general model (VQM) Developers model Low bandwidth model Fast low bandwidth model Video quality model for variable frame delay (VQM_VFD) 4/25/2012 22
Peak Signal-to-Noise Ratio (PSNR) Logical extension of signal-to-noise ratio Full reference metric pixel-by-pixel comparison Usually luma only 20 log10(255 / RMSE(Yin, Yout)) Logarithmic decibel scale Not perceptual Needs perfect calibration 4/25/2012 23
Peak Signal-to-Noise Ratio (PSNR) Many formulae used, not reported Color PSNR Individual frame delay Sub-sampled video Different peak value (235 versus 255) Frame PSNR, then average Accuracy? 4/25/2012 24
ITS Model Design Overview 1. Perceptual filter 2. Spatial-temporal (S-T) regions 3. Features 4. Perceptibility masking 5. Comparison (original versus processed) 6. Separate gains from losses 7. Parameter Often worst 5% 4/25/2012 25
Perceptual Filter Spatial Information (SI) Like Sobel but larger filter size 13 x 13 for VGA, SD and HDTV (si13) Missing edges blurring, smearing Added edges tiling, edge busyness, lines -W N -W 2 -W 1 0 W 1 W 2 W N Vertical Lowpass -W N -W N -W N -W 2 -W 1 0 W 1 W 2 W N -W 2 -W 1 0 W 1 W 2 W N -W 2 -W 1 0 W 1 W 2 W N -W N -W 2 -W 1 0 W 1 W 2 W N Horizontal Bandpass 4/25/2012 26
Perceptual Filter Horizontal/Vertical (HV) si13 Filter V V(i,j,t) R θ 2 θ r min H(i,j,t) H si13 2 θ Filter 4/25/2012 27
Spatial Temporal (S-T) Region Horizontal- Width ( h) Vertical-Width ( v) Temporal-Width ( t) F k F k+1 F k+2 F k+3 F k+4 F k+5 Video Frames 4/25/2012 28
NTIA General Model Design goal Generally applicable to all video systems Seven parameters S-T size (8 x 8) or (4 x 4) 0.2 sec or 1-frame Limitations Trained on SD, CIF Few transmission errors One overall delay 4/25/2012 29
NTIA General Model (VQM) Finalized: 2000 / 2001 Validated and standardized ANSI T1.801.03-2003 ITU-T Rec. J.144 ITU-R Rec. BT.1683 Summary paper M. Pinson and S. Wolf, A new standardized method for objectively measuring video quality, IEEE Transactions on Broadcasting, September 2004. 4/25/2012 30
NTIA General Model (VQM) Training data 1,536 clips VQEG validation data 525-line 625-line VQM = 0.948 VQM = 0.938 PSNR = 0.804 VQM = 0.886 PSNR = 0.733 4/25/2012 31
Developer s Model Design goal Slight drop in accuracy 6 to 10 times faster Finalized 2000 / 2001 Training correlation PSNR 0.895 General model 0.948 Developer s 0.940 Not validated Not recommended for: Color impairments, noise, transmission errors Training data 4/25/2012 32
Low Bandwidth Model Fast Low Bandwidth Model Developed 2003-2004 Small changes until 2006 Both models validated Fast model standardized ITU-T Rec. J.249 Summary paper M. Pinson and S. Wolf, Low bandwidth reduced reference video quality monitoring system, VPQM, January 2005 Design goal: Accuracy of General Model at 10 kbits/s HDTV through QCIF 4/25/2012 33
Fast Low Bandwidth Model Overview Bandwidth for SD: model: 12-14 kbits/s Calibration: 22-24 kbits/s Seven parameters Limitations One overall delay What is new S-T size: 30 x 30 x 1sec SI filter adapts size Spatial alignment error compensation More training data: HDTV, SD, VGA, CIF, QCIF Transmission errors 4/25/2012 34
Fast Low Bandwidth Model Validation 525-line PSNR 0.826 Low BW 0.855 Fast Low BW 0.882 Validation 625-line PSNR 0.857 Low BW 0.828 Fast Low BW 0.866 Statistically equivalent to or better than PSNR But Difficult to deploy 4/25/2012 35
Video Quality Model for Variable Frame Delay (VQM_VFD) Finalized August 2011 Training data: 83 datasets 11,255 clips Not validated Fully disclosed S. Wolf and M. Pinson, Video quality model for variable frame delay (VQM_VFD), NTIA TM-11-482, September 2011 Design goal: Variable frame delay Adapt to viewing distance 0.90 correlation for all resolutions (training) 4/25/2012 36
Video Quality Model for Variable Frame Delay (VQM_VFD) Overview Parameters Six frame comparisons Two VFD patterns NTIA s most accurate model Not recommended for: Color impairments What is new Compare each output frame to best matching input frame Variable frame delay (VFD) Neural network 70% training, 30% testing S-T size based on angle seen by the eye 4/25/2012 37
VQM_VFD Training correlation QCIF 0.91 CIF 0.91 VGA 0.90 SD 0.91 HD 0.90 Well behaved Few outliers 4/25/2012 38
MATLAB Run Time Comparison 10 sec VGA clip, 30 fps High powered PC Six Core Intel Xeon, 12GB RAM, solid state drive General FR cal. Developers FR cal. Fast Low Bandwidth RR cal. VQM_VFD FR cal. PSNR Luma FR cal. PSNR search ±3 pixels, ±1 sec 2001 2001 2006 2011 Unknown 2007 1 min ½ min 1 min 2½ min ½ min 160 min 4/25/2012 39
Video Quality Metric (VQM) Software Download www.its.bldrdoc.gov Free Commercial Non-commercial Compiled MATLAB source code BVQM Graphical user interface Multiple clips CVQM Command line interface One clip 4/25/2012 40
Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation Break 4/25/2012 41
Video Quality Experts Group (VQEG) Bring international experts together Industry Academia Venue Email reflectors Open meetings Free www.vqeg.org Vision Advance field of video quality assessment Subjective video quality experiments Validate objective video quality models Collaborate to develop new techniques 4/25/2012 42
VQEG s Role in Standards Accuracy How well objective model matches subjective scores Model validation Unbiased evaluation Prove accuracy Subjective testing How to perform Adapt to new technology Combine resources 4/25/2012 43
VQEG Current Focus Hybrid Perceptual / Bit-stream Models Hybrid Model Test Plan Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Bit-stream Hybrid Perceptual Bit-Stream 4/25/2012 44
VQEG Joint Effort Group (JEG) JEG-Hybrid group Hybrid video quality model Output video Bit-stream Open collaboration Subjective testing Parse bit-stream Metrics different developers Model 4/25/2012 45
VQEG Model Validation Write test plan Models requirements Evaluation criteria Subject test limits Independent lab group (ILG) Independent arbitrators Unbiased testing Dependable analysis Perform test plan Model submission Source video selection Impairment selection Subjective testing Model evaluation Option to withdraw Final report 4/25/2012 46
International Telecommunication Union (ITU) Recs Resulting from VQEG Tests VQEG analyzes models ITU decides which models to standardize FR and RR Standard definition ITU-T Rec. J.144 ITU-R Rec. BT.1683 ITU-T Rec. J.249 ITU-T Rec J.340 VGA/CIF/QCIF ITU-T Rec. J.247 ITU-T Rec. J.246 ITU-R Rec. BT.1866 ITU-R Rec. BT.1867 HDTV ITU-T Rec. J.341 ITU-T Rec. J.342 PSNR ITU-T Rec. J.340 4/25/2012 47
Topics Why objective models are needed ITS objective video quality models Truth data Calibrate impaired video Model perception Validation Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 48
Validating Flexibility: Finding High Quality Video Royalty-free test material is scarce Large files, distribution Purchased content Low quality Usage limitations Time limitations Internal only Content owners Licensing Litigation ANSI T1.801.01 Standard Sequences 4/25/2012 49
Consumer Digital Video Library (CDVL) at www.cdvl.org Royalty-free video High quality Standard legal agreement Protect contributor s rights for commercial applications Explain user s rights Privacy Users share video Automated redistribution NTIA outdoor mall with tulips (1e) 1080p 25fps 4/25/2012 50
www.cdvl.org Uncompressed AVI in YUV color space Variety of source videos 1080p 29.97fps 1080p 25fps 1080i 59.94fps 1080i 50fps NTSC & PAL VGA, CIF, QCIF 3DTV (coming soon) NTIA Aspen Trees in Fall Color, Rapid Scene Cuts 1080p 29.97fps 4/25/2012 51
These uses are allowed Internal research & development Technical papers Conference research presentations Standards committees Product development & improvement Educational demos NTIA Rainbow Collage Zooming In: format 59.94fps interlace 4/25/2012 52
These uses are NOT allowed Product marketing Re-publishing of snapshots in product brochures Redistribution with commercial products Performing subjective tests for profit Use of clips in television shows or commercials PSCR EMS - Burn Patient, Hospital Call-In 1080i 59.94fps 4/25/2012 53
Topics Why objective models are needed ITS objective video quality models Truth data Calibrate impaired video Model perception Validation Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 54
Video Quality Subjective Testing International standards ITU-R Rec. BT.500 (video) ITU-T Rec. P.910 (video) ITU-T Rec. P.911 (audio-video) Pristine environment Controlled lighting Grey walls No background noise Monitor calibration Uncompressed playback Pristine Laboratory 4/25/2012 55
ITU-R BT.500 & ITU-T P.910 Living room paradigm CRT television Reliable link Quiet, non-distracting environment Repeatable mean opinion scores (MOS) Equipment comparison Just noticeable differences Goal: remove impact unwanted variables 4/25/2012 56
Video Technology Changes Flat screen, LCD Television Computer monitor Portable devices Laptop, tablet, smartphone Unreliable delivery Busy, distracting environment Move between devices 4/25/2012 57
Draft New Recommendation Best practices Multiple environments Laboratory Public Modify subjective scales Non-separable interactions Compressed material Audio & video Mandatory reporting Public Laboratory 4/25/2012 58
Single Stimulus Rating Scales Absolute Category Rating (ACR) One stimuli 5 levels: excellent, good, fair, poor, bad Variants 9 levels add levels between words 11 levels add levels between words, best, worst Hidden reference Alternate words 4/25/2012 59
Double Stimulus Rating Scales Degradation Category Rating (DCR) aka Double Stimulus Impairment Scale (DSIS) Reference presented first Rate difference How well does impaired sequence reproduce the reference? Comparison Category Rating (CCR) aka Double Stimulus Comparison Scale (DSCS) Order random 7 level discrete scale (better...worse) 4/25/2012 60
NTT Study: Choice of Rating Scale Compare four different types of rating scales Discrete vs. continuous Single stimulus vs. double stimulus 5 level vs. 11 level Same videos One set of subjects Questionnaire: ease of use Assessment time T. Tominaga, T. Hayashi, J. Okamoto, and A. Takahashi, Performance comparisons of subjective quality assessment methods for mobile video, Quality of Multimedia Experience (QoMEX), June 2010. 4/25/2012 61
Number of Rating Scale Correlations very high for all methods Rating scale has minor impact on data accuracy Ease of Use: 5 = easy, 1 = difficult Method Assessment Time Ease of use ACR 5 level 12 sec 4.33 ACR 11 level 14 sec 3.25 DCR 20 sec 3.92 SAMVIQ 29 sec 3.48 DSCQS 41 sec 3.31 4/25/2012 62
IEEE Paper: Number of Rating Levels Compare MOS from four ACR scales 5 level vs. 9 level Discrete vs. continuous Same Videos No significant statistical difference Repeatable results Single stimulus Different groups of subjects Q. Huynh-Thu, M. Garcia, F. Speranza, P. Corriveau and A. Raake, Study of rating scales for subjective quality assessment of high-definition video, IEEE Transactions on Broadcasting, vol. 57. No. 1, p. 1-14, March 2011. 4/25/2012 63
VQEG Study: Impact of Environment Video Quality Experts Group (VQEG) Two ITS authors Audiovisual quality Six laboratories Four countries France, Germany, Poland & USA The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study, submitted to IEEE 4/25/2012 64
Impact of Environment Country Native language Audio presentation device Speakers / earbuds / headphones Monitor size (7 to 42 ) Viewing distance, angle (8º to 20º) Screen brightness, color (calibrated, default) Lighting (20 lux to 200 lux) Controlled laboratory / public area 4/25/2012 65
Impact of Environment 10 datasets 6 Labs 6 Controlled Environments 4 Public Environments 4/25/2012 66
Impact of Environment 1 2 3 4 5 6 7 8 9 10 1 1.00 0.95 0.98 0.97 0.97 0.97 0.98 0.97 0.96 0.95 2 0.95 1.00 0.95 0.94 0.94 0.93 0.96 0.94 0.93 0.93 3 0.98 0.95 1.00 0.98 0.98 0.98 0.99 0.98 0.97 0.97 4 0.97 0.94 0.98 1.00 0.98 0.96 0.97 0.97 0.96 0.96 5 0.97 0.94 0.98 0.98 1.00 0.96 0.97 0.96 0.97 0.96 6 0.97 0.93 0.98 0.96 0.96 1.00 0.99 0.97 0.97 0.95 7 0.98 0.96 0.99 0.97 0.97 0.99 1.00 0.97 0.97 0.96 8 0.97 0.94 0.98 0.97 0.96 0.97 0.97 1.00 0.96 0.96 9 0.96 0.93 0.97 0.96 0.97 0.97 0.97 0.96 1.00 0.97 10 0.95 0.93 0.97 0.96 0.96 0.95 0.96 0.96 0.97 1.00 # 28 9 34 25 25 24 24 15 14 15 4/25/2012 67
Impact of Environment 4/25/2012 68
Impact of Environment Factors that did not seem to matter Native language / speech comprehension Culture / country of origin Lighting Background noise Wall color Objects on the wall Viewing distance Monitor calibration Color blindness Vision good but not 20/20 Hearing (not tested) Translation of ACR scale labels 4/25/2012 69
AES Paper: Bias in Subjective Testing Summary paper Perceptual meaning of ACR labels Continuous scale Choose relative magnitude ACR words Different languages Label locations shift No language perfectly spaced S. Zielinski, F. Rumsey, and S. Bech, On some biases encountered in modern audio quality listening tests a review, Journal of Audio Engineering Society, vol. 56, no 6, June 2008. 4/25/2012 70
Bias in Subjective Testing Subjective experiment ACR words vs. unlabeled scale Identical results Ratings not impacted by Translation of labels Slightly uneven perceptual distribution Labels may be modified to suit test Test accuracy not impacted 4/25/2012 71
Draft New Recommendation ITU-T Study Group 9 Audio & video subjective test, modern paradigm Complete subjective testing guide Easy to implement Avoid unnecessary constraints Experiments where ITU-R Rec.BT.500 not suitable Seeking comments submit through VQEG 4/25/2012 72
Topics Why objective models are needed ITS objective video quality models Truth data Calibrate impaired video Model perception Validation Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 73
Task-Based Subjective Testing What is the license plate number? 4/25/2012 74
Task-Based Subjective Testing Ask subjects to identify objects 4/25/2012 75
Task-Based Subjective Testing Public Safety Video Quality (PSVQ) Level of video quality for public safety Firefighters, police, emergency medical services Quality is the wrong question Ability to perform task Practitioner participation ITU-T Rec. P.912 How to perform task based subjective tests Different standards bodies 4/25/2012 76
Task-Based Subjective Testing ITU-T Rec. P.912 How to perform task-based subjective tests Association of Public-Safety Communications Officials (APCO) International Guidance for configuration of interoperable public safety broadband communications National Public Safety Telecommunications Council (NPSTC) Standards for interoperable public safety broadband communications 4/25/2012 77
Margaret Pinson mpinson@its.bldrdoc.gov