Margaret H. Pinson - PDF Free Download

Margaret H. Pinson mpinson@its.bldrdoc.gov

Introductions Institute for Telecommunication Sciences U.S. Department of Commerce Technology transfer Impartial Basic research Margaret H. Pinson Video quality research Standards leadership Over 20 years 4/25/2012 1

What We Do Video quality Measure quality of transmitted moving video Focus on transmission Various transmission and display Entertainment and video conferencing Subjective testing Carefully designed experiment Ask people their opinion Expensive Time consuming 4/25/2012 2

Topics Why objective models are needed ITS objective video quality models Truth data subjective tests ITS calibration algorithms ITS objective video quality models Video Quality Experts Group (VQEG) Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 3

Metrics for Analog Video Test Cards Static patterns Adjust television and camera RCA Indian Head test pattern SMPTE Color Bars 4/25/2012 4

Metrics for Digital Video Quality changes Variable delay Scene matters Network impairments Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder 4/25/2012 5

Types of Metrics Full reference (FR) Reduced reference (RR) No reference (NR) Bit-stream Hybrid perceptual / bit-stream 4/25/2012 6

Types of Metrics Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Full Reference (FR) Reduced Reference (RR) 4/25/2012 7

Types of Metrics Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Bit-stream Hybrid Perceptual Bit-Stream Full Reference (FR) No Reference (NR) Reduced Reference (RR) 4/25/2012 8

ITS Approach to Video Metrics Reduced Reference (RR) Need access to original video Bandwidth between input / output Side channel carried with the video Separate channel Reasons Accuracy In-service 4/25/2012 9

Truth data Steps to an Objective Video Quality Model Calibrate impaired video Model perception Validation 4/25/2012 10

Truth Data 1992 to 1999 1. 42 clips 2. 105 clips 3. 112 clips 4. 90 clips 5. 90 clips 6. 90 clips 7. 90 clips 8. 600 clips 9. 132 clips 10. 164 clips 11. 48 clips 11 Subjective Video Quality Tests Standard definition (NTSC) CIF Variety of systems 128 kb/s to 45 Mb/s Network errors Digital codecs Analog noise 9 to 10 second clips Train one model on 11 datasets? 4/25/2012 11

Why Subjective Datasets Cannot Be Directly Compared Raw Data Fitted Data M. Pinson and S. Wolf, An Objective Method for Combining Multiple Subjective Data Sets, SPIE Video Communications and Image Processing Conference, Lugano, Switzerland, July 2003. 4/25/2012 12

Iterated Nested Least-Squares Algorithm (INLSA) Combine multiple subjective datasets Pick objective video quality metrics Accurate enough to be meaningful Several metrics to reduce error INLSA finds mapping Linear fit Single subjective scale S. D. Voran, An iterated nested least-squares algorithm for fitting multiple data sets, NTIA Technical Memorandum TM-03-397, Oct. 2002. 4/25/2012 13

Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation 4/25/2012 14

ITS Calibration Algorithms Full reference calibration Reduced reference calibration PSNR exhaustive search 4/25/2012 15

Calibration Time alignment Spatial scaling Spatial shift Valid region SD, HDTV, 3DTV Border not seen Luma gain & offset Color gain & offset 4/25/2012 16

Full Reference Calibration Design goals Accurate Fast Validated and Standardized ANSI T1.801.03-2003 ITU-T Rec. J.144 ITU-R Rec. BT.1683 Fully disclosed S. Wolf and M. Pinson, Video quality measurement techniques, NTIA Report 02-392, June 2002. M. Pinson and S. Wolf, A new standardized method for objectively measuring video quality, IEEE Transactions on Broadcasting, September 2004. 4/25/2012 17

Reduced Reference Calibration Design goals Run in-service with data channel Similar run time Slightly less accurate but not by much 22 kb/s for standard definition 525-line Spatial scaling Validated and standardized ITU-T Rec. J.244 Fully disclosed M. H. Pinson and S. Wolf, Reduced reference calibration algorithms, NTIA Technical Report TR- 06-433, October 2005. 4/25/2012 18

PSNR Exhaustive Search Calibration Design goal Very accurate Maximize Peak Signal-to-Noise Ratio (PSNR) Exhaustive search of spatial & temporal shifts Ideal luma gain/offset Painfully slow Limits search range Baseline PSNR measurement ITU-T Recommendation Rec. J.340 (2010) 4/25/2012 19

Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation 4/25/2012 20

Perceptual Video Quality Models Objective model Predict human perception Judgment Flexible New scenes New video coders Network errors 4/25/2012 21

Objective Models Peak signal-to-noise ratio (PSNR) NTIA general model (VQM) Developers model Low bandwidth model Fast low bandwidth model Video quality model for variable frame delay (VQM_VFD) 4/25/2012 22

Peak Signal-to-Noise Ratio (PSNR) Logical extension of signal-to-noise ratio Full reference metric pixel-by-pixel comparison Usually luma only 20 log10(255 / RMSE(Yin, Yout)) Logarithmic decibel scale Not perceptual Needs perfect calibration 4/25/2012 23

Peak Signal-to-Noise Ratio (PSNR) Many formulae used, not reported Color PSNR Individual frame delay Sub-sampled video Different peak value (235 versus 255) Frame PSNR, then average Accuracy? 4/25/2012 24

ITS Model Design Overview 1. Perceptual filter 2. Spatial-temporal (S-T) regions 3. Features 4. Perceptibility masking 5. Comparison (original versus processed) 6. Separate gains from losses 7. Parameter Often worst 5% 4/25/2012 25

Perceptual Filter Spatial Information (SI) Like Sobel but larger filter size 13 x 13 for VGA, SD and HDTV (si13) Missing edges blurring, smearing Added edges tiling, edge busyness, lines -W N -W 2 -W 1 0 W 1 W 2 W N Vertical Lowpass -W N -W N -W N -W 2 -W 1 0 W 1 W 2 W N -W 2 -W 1 0 W 1 W 2 W N -W 2 -W 1 0 W 1 W 2 W N -W N -W 2 -W 1 0 W 1 W 2 W N Horizontal Bandpass 4/25/2012 26

Perceptual Filter Horizontal/Vertical (HV) si13 Filter V V(i,j,t) R θ 2 θ r min H(i,j,t) H si13 2 θ Filter 4/25/2012 27

Spatial Temporal (S-T) Region Horizontal- Width ( h) Vertical-Width ( v) Temporal-Width ( t) F k F k+1 F k+2 F k+3 F k+4 F k+5 Video Frames 4/25/2012 28

NTIA General Model Design goal Generally applicable to all video systems Seven parameters S-T size (8 x 8) or (4 x 4) 0.2 sec or 1-frame Limitations Trained on SD, CIF Few transmission errors One overall delay 4/25/2012 29

NTIA General Model (VQM) Finalized: 2000 / 2001 Validated and standardized ANSI T1.801.03-2003 ITU-T Rec. J.144 ITU-R Rec. BT.1683 Summary paper M. Pinson and S. Wolf, A new standardized method for objectively measuring video quality, IEEE Transactions on Broadcasting, September 2004. 4/25/2012 30

NTIA General Model (VQM) Training data 1,536 clips VQEG validation data 525-line 625-line VQM = 0.948 VQM = 0.938 PSNR = 0.804 VQM = 0.886 PSNR = 0.733 4/25/2012 31

Developer s Model Design goal Slight drop in accuracy 6 to 10 times faster Finalized 2000 / 2001 Training correlation PSNR 0.895 General model 0.948 Developer s 0.940 Not validated Not recommended for: Color impairments, noise, transmission errors Training data 4/25/2012 32

Low Bandwidth Model Fast Low Bandwidth Model Developed 2003-2004 Small changes until 2006 Both models validated Fast model standardized ITU-T Rec. J.249 Summary paper M. Pinson and S. Wolf, Low bandwidth reduced reference video quality monitoring system, VPQM, January 2005 Design goal: Accuracy of General Model at 10 kbits/s HDTV through QCIF 4/25/2012 33

Fast Low Bandwidth Model Overview Bandwidth for SD: model: 12-14 kbits/s Calibration: 22-24 kbits/s Seven parameters Limitations One overall delay What is new S-T size: 30 x 30 x 1sec SI filter adapts size Spatial alignment error compensation More training data: HDTV, SD, VGA, CIF, QCIF Transmission errors 4/25/2012 34

Fast Low Bandwidth Model Validation 525-line PSNR 0.826 Low BW 0.855 Fast Low BW 0.882 Validation 625-line PSNR 0.857 Low BW 0.828 Fast Low BW 0.866 Statistically equivalent to or better than PSNR But Difficult to deploy 4/25/2012 35

Video Quality Model for Variable Frame Delay (VQM_VFD) Finalized August 2011 Training data: 83 datasets 11,255 clips Not validated Fully disclosed S. Wolf and M. Pinson, Video quality model for variable frame delay (VQM_VFD), NTIA TM-11-482, September 2011 Design goal: Variable frame delay Adapt to viewing distance 0.90 correlation for all resolutions (training) 4/25/2012 36

Video Quality Model for Variable Frame Delay (VQM_VFD) Overview Parameters Six frame comparisons Two VFD patterns NTIA s most accurate model Not recommended for: Color impairments What is new Compare each output frame to best matching input frame Variable frame delay (VFD) Neural network 70% training, 30% testing S-T size based on angle seen by the eye 4/25/2012 37

VQM_VFD Training correlation QCIF 0.91 CIF 0.91 VGA 0.90 SD 0.91 HD 0.90 Well behaved Few outliers 4/25/2012 38

MATLAB Run Time Comparison 10 sec VGA clip, 30 fps High powered PC Six Core Intel Xeon, 12GB RAM, solid state drive General FR cal. Developers FR cal. Fast Low Bandwidth RR cal. VQM_VFD FR cal. PSNR Luma FR cal. PSNR search ±3 pixels, ±1 sec 2001 2001 2006 2011 Unknown 2007 1 min ½ min 1 min 2½ min ½ min 160 min 4/25/2012 39

Video Quality Metric (VQM) Software Download www.its.bldrdoc.gov Free Commercial Non-commercial Compiled MATLAB source code BVQM Graphical user interface Multiple clips CVQM Command line interface One clip 4/25/2012 40

Truth data Steps to an Objective Video Quality Metric Calibrate impaired video Model perception Validation Break 4/25/2012 41

Video Quality Experts Group (VQEG) Bring international experts together Industry Academia Venue Email reflectors Open meetings Free www.vqeg.org Vision Advance field of video quality assessment Subjective video quality experiments Validate objective video quality models Collaborate to develop new techniques 4/25/2012 42

VQEG s Role in Standards Accuracy How well objective model matches subjective scores Model validation Unbiased evaluation Prove accuracy Subjective testing How to perform Adapt to new technology Combine resources 4/25/2012 43

VQEG Current Focus Hybrid Perceptual / Bit-stream Models Hybrid Model Test Plan Video Input Digital Video Transmission System Video Output Encoder Digital Channel Decoder Bit-stream Hybrid Perceptual Bit-Stream 4/25/2012 44

VQEG Joint Effort Group (JEG) JEG-Hybrid group Hybrid video quality model Output video Bit-stream Open collaboration Subjective testing Parse bit-stream Metrics different developers Model 4/25/2012 45

VQEG Model Validation Write test plan Models requirements Evaluation criteria Subject test limits Independent lab group (ILG) Independent arbitrators Unbiased testing Dependable analysis Perform test plan Model submission Source video selection Impairment selection Subjective testing Model evaluation Option to withdraw Final report 4/25/2012 46

International Telecommunication Union (ITU) Recs Resulting from VQEG Tests VQEG analyzes models ITU decides which models to standardize FR and RR Standard definition ITU-T Rec. J.144 ITU-R Rec. BT.1683 ITU-T Rec. J.249 ITU-T Rec J.340 VGA/CIF/QCIF ITU-T Rec. J.247 ITU-T Rec. J.246 ITU-R Rec. BT.1866 ITU-R Rec. BT.1867 HDTV ITU-T Rec. J.341 ITU-T Rec. J.342 PSNR ITU-T Rec. J.340 4/25/2012 47

Topics Why objective models are needed ITS objective video quality models Truth data Calibrate impaired video Model perception Validation Consumer Digital Video Library (CDVL) Drafting a new subjective testing standard Task based subjective testing 4/25/2012 48

Validating Flexibility: Finding High Quality Video Royalty-free test material is scarce Large files, distribution Purchased content Low quality Usage limitations Time limitations Internal only Content owners Licensing Litigation ANSI T1.801.01 Standard Sequences 4/25/2012 49

Consumer Digital Video Library (CDVL) at www.cdvl.org Royalty-free video High quality Standard legal agreement Protect contributor s rights for commercial applications Explain user s rights Privacy Users share video Automated redistribution NTIA outdoor mall with tulips (1e) 1080p 25fps 4/25/2012 50

www.cdvl.org Uncompressed AVI in YUV color space Variety of source videos 1080p 29.97fps 1080p 25fps 1080i 59.94fps 1080i 50fps NTSC & PAL VGA, CIF, QCIF 3DTV (coming soon) NTIA Aspen Trees in Fall Color, Rapid Scene Cuts 1080p 29.97fps 4/25/2012 51

These uses are allowed Internal research & development Technical papers Conference research presentations Standards committees Product development & improvement Educational demos NTIA Rainbow Collage Zooming In: format 59.94fps interlace 4/25/2012 52

These uses are NOT allowed Product marketing Re-publishing of snapshots in product brochures Redistribution with commercial products Performing subjective tests for profit Use of clips in television shows or commercials PSCR EMS - Burn Patient, Hospital Call-In 1080i 59.94fps 4/25/2012 53

Video Quality Subjective Testing International standards ITU-R Rec. BT.500 (video) ITU-T Rec. P.910 (video) ITU-T Rec. P.911 (audio-video) Pristine environment Controlled lighting Grey walls No background noise Monitor calibration Uncompressed playback Pristine Laboratory 4/25/2012 55

ITU-R BT.500 & ITU-T P.910 Living room paradigm CRT television Reliable link Quiet, non-distracting environment Repeatable mean opinion scores (MOS) Equipment comparison Just noticeable differences Goal: remove impact unwanted variables 4/25/2012 56

Video Technology Changes Flat screen, LCD Television Computer monitor Portable devices Laptop, tablet, smartphone Unreliable delivery Busy, distracting environment Move between devices 4/25/2012 57

Draft New Recommendation Best practices Multiple environments Laboratory Public Modify subjective scales Non-separable interactions Compressed material Audio & video Mandatory reporting Public Laboratory 4/25/2012 58

Single Stimulus Rating Scales Absolute Category Rating (ACR) One stimuli 5 levels: excellent, good, fair, poor, bad Variants 9 levels add levels between words 11 levels add levels between words, best, worst Hidden reference Alternate words 4/25/2012 59

Double Stimulus Rating Scales Degradation Category Rating (DCR) aka Double Stimulus Impairment Scale (DSIS) Reference presented first Rate difference How well does impaired sequence reproduce the reference? Comparison Category Rating (CCR) aka Double Stimulus Comparison Scale (DSCS) Order random 7 level discrete scale (better...worse) 4/25/2012 60

NTT Study: Choice of Rating Scale Compare four different types of rating scales Discrete vs. continuous Single stimulus vs. double stimulus 5 level vs. 11 level Same videos One set of subjects Questionnaire: ease of use Assessment time T. Tominaga, T. Hayashi, J. Okamoto, and A. Takahashi, Performance comparisons of subjective quality assessment methods for mobile video, Quality of Multimedia Experience (QoMEX), June 2010. 4/25/2012 61

Number of Rating Scale Correlations very high for all methods Rating scale has minor impact on data accuracy Ease of Use: 5 = easy, 1 = difficult Method Assessment Time Ease of use ACR 5 level 12 sec 4.33 ACR 11 level 14 sec 3.25 DCR 20 sec 3.92 SAMVIQ 29 sec 3.48 DSCQS 41 sec 3.31 4/25/2012 62

IEEE Paper: Number of Rating Levels Compare MOS from four ACR scales 5 level vs. 9 level Discrete vs. continuous Same Videos No significant statistical difference Repeatable results Single stimulus Different groups of subjects Q. Huynh-Thu, M. Garcia, F. Speranza, P. Corriveau and A. Raake, Study of rating scales for subjective quality assessment of high-definition video, IEEE Transactions on Broadcasting, vol. 57. No. 1, p. 1-14, March 2011. 4/25/2012 63

VQEG Study: Impact of Environment Video Quality Experts Group (VQEG) Two ITS authors Audiovisual quality Six laboratories Four countries France, Germany, Poland & USA The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study, submitted to IEEE 4/25/2012 64

Impact of Environment Country Native language Audio presentation device Speakers / earbuds / headphones Monitor size (7 to 42 ) Viewing distance, angle (8º to 20º) Screen brightness, color (calibrated, default) Lighting (20 lux to 200 lux) Controlled laboratory / public area 4/25/2012 65

Impact of Environment 10 datasets 6 Labs 6 Controlled Environments 4 Public Environments 4/25/2012 66

Impact of Environment 1 2 3 4 5 6 7 8 9 10 1 1.00 0.95 0.98 0.97 0.97 0.97 0.98 0.97 0.96 0.95 2 0.95 1.00 0.95 0.94 0.94 0.93 0.96 0.94 0.93 0.93 3 0.98 0.95 1.00 0.98 0.98 0.98 0.99 0.98 0.97 0.97 4 0.97 0.94 0.98 1.00 0.98 0.96 0.97 0.97 0.96 0.96 5 0.97 0.94 0.98 0.98 1.00 0.96 0.97 0.96 0.97 0.96 6 0.97 0.93 0.98 0.96 0.96 1.00 0.99 0.97 0.97 0.95 7 0.98 0.96 0.99 0.97 0.97 0.99 1.00 0.97 0.97 0.96 8 0.97 0.94 0.98 0.97 0.96 0.97 0.97 1.00 0.96 0.96 9 0.96 0.93 0.97 0.96 0.97 0.97 0.97 0.96 1.00 0.97 10 0.95 0.93 0.97 0.96 0.96 0.95 0.96 0.96 0.97 1.00 # 28 9 34 25 25 24 24 15 14 15 4/25/2012 67

Impact of Environment 4/25/2012 68

Impact of Environment Factors that did not seem to matter Native language / speech comprehension Culture / country of origin Lighting Background noise Wall color Objects on the wall Viewing distance Monitor calibration Color blindness Vision good but not 20/20 Hearing (not tested) Translation of ACR scale labels 4/25/2012 69

AES Paper: Bias in Subjective Testing Summary paper Perceptual meaning of ACR labels Continuous scale Choose relative magnitude ACR words Different languages Label locations shift No language perfectly spaced S. Zielinski, F. Rumsey, and S. Bech, On some biases encountered in modern audio quality listening tests a review, Journal of Audio Engineering Society, vol. 56, no 6, June 2008. 4/25/2012 70

Bias in Subjective Testing Subjective experiment ACR words vs. unlabeled scale Identical results Ratings not impacted by Translation of labels Slightly uneven perceptual distribution Labels may be modified to suit test Test accuracy not impacted 4/25/2012 71

Draft New Recommendation ITU-T Study Group 9 Audio & video subjective test, modern paradigm Complete subjective testing guide Easy to implement Avoid unnecessary constraints Experiments where ITU-R Rec.BT.500 not suitable Seeking comments submit through VQEG 4/25/2012 72

Task-Based Subjective Testing What is the license plate number? 4/25/2012 74

Task-Based Subjective Testing Ask subjects to identify objects 4/25/2012 75

Task-Based Subjective Testing Public Safety Video Quality (PSVQ) Level of video quality for public safety Firefighters, police, emergency medical services Quality is the wrong question Ability to perform task Practitioner participation ITU-T Rec. P.912 How to perform task based subjective tests Different standards bodies 4/25/2012 76

Task-Based Subjective Testing ITU-T Rec. P.912 How to perform task-based subjective tests Association of Public-Safety Communications Officials (APCO) International Guidance for configuration of interoperable public safety broadband communications National Public Safety Telecommunications Council (NPSTC) Standards for interoperable public safety broadband communications 4/25/2012 77

Margaret Pinson mpinson@its.bldrdoc.gov