Perceptual Coding: Hype or Hope?

QoMEX 2016 Keynote Speech Perceptual Coding: Hype or Hope? June 6, 2016 C.-C. Jay Kuo University of Southern California 1

Is There Anything Left in Video Coding? First Asked in Late 90 s Background After MPEG-4 Bottleneck in low-bit-rate video coding In Response H.264/AVC & H.265 Asked Again Recently Background After great success of H.264 and completion of H.265 Industrial companies taking the lead nowadays & focusing on performance fine-tuning In Response Perceptual coding?? 2

Video Coding in Next Decade (2015-25) Two driving applications Ultra High Definition (UHD) Video UHD video (4K or 8K) at a bit rate of 25Mbps or above TV broadcasting Streaming Video HD/UHD video at a rate of 1-10Mbps Streaming to mobile devices Both somehow can be covered by H.264/H.265 standards Will video coding R&D still be vibrant? Business perspective: If demand (video streaming applications) is still higher than supply (channel bandwidth, 4G+WiFi), there will be a need Technology perspective: any new idea? 3

New Quality Metrics Mathematically Driven Mostly data fitting with a parametric model QA databases are needed for parameter/weight training Examples: SSIM, FSIM, MMF, etc. Lin and Kuo, Perceptual visual quality metrics: a survey, Journal of Visual Communication and Image Representation, 2011. No psychovisual support Coarse-grain quality metrics Psychovisual Model Driven Built upon human visual system (HVS) characteristics Hu, et al., Compressed image quality metric based on perceptually weighted distortion, IEEE T-IP, Vol. 24, No. 12, December 2015. Hu, et al., Objective video quality assessment based on perceptually weighted mean-squared-error, to appear in IEEE T-CSVT 4

Nothing expected to be dramatically different along this direction FRESH IDEA IS ESSENTIAL! 5

Perceived Quality: Continuous or Discrete? JPEG libjpeg Quality Factor: 1~100 + Original = 101 quality levels Can you differentiate all of them? 6

Stair Quality Function (SQF) 7

Just-Noticeable-Difference (JND) Revisited 1. Anchor 2. Non-noticeable 3. Just Noticeable 4. Noticeable 8

From QoS to QoE Quality of Services (QoS) System Centric Quality of Experience (QoE) Human Centric 9

MCL-JCI Dataset Media Communications Lab (MCL) JND-based Compressed Image (JCI) Dataset 50 source images Resolution: 1920x1080 Source image + 100 JPEG-coded image with QF=1,,100 Each was viewed by 30 subjects Just released in the MCL website http://mcl.usc.edu/mcl-jci-dataset/

50 Source Images in MCL-JCI Dataset Image Resolution: 1920 x 1080 Each is coded by JPEG with QF=1,, 100 Total dataset size: 50x101=5,050 11

SI & Colorfulness Distribution of Source Images 12

Semantics and Property-Specific Categories Number People 5 Animals 3 Plants 4 Buildings 8 Water or Lake 5 Sky 3 Bridge 3 Boats or Cars 5 Indoor 8 Dark Scene 6 13

Subjective Tests Participants About 150 people Age: 20-40 Each source image viewed by 30 subjects Controlled environment Viewing distance: 2 meters (1.6 times the picture height) Displayed on a 65 TV with native resolution of 3840x2160 Two images displayed side-by-side: anchor and comparison images 14

Sequential JND Point Search 15

Bisection Search 16

Statistics of JND Numbers and Highest/Lowest QF Values Left scale: Mean of JND numbers (blue bar): 5-9 Standard Deviation of JND numbers (green bar): 1-4 17 Right scale: Highest QF values of JND points (yellow curve): 36-88 Lowest QF values of JND points (orange curve): 5

Number of Level = 5 (Minimum cases) 18

Number of Level = 8 (Maximum cases) 19

JND Distribution JND is a random variable Different from person to person Highly dependent on visual content and test environment 20

JND Processing Pipeline GMM (Gaussian Mixture Model) JND number: the number of mixtures JND location: the mean of each mixture JND height: the weight of each mixture JND Histogram Processed JND Points SQF 21

Variance of JND Variance Can be used to determine the Q-function The tail region indicates the percentage of viewers who can tell the difference between the two quality levels (unsatisfied viewers) JND-based Rate Control Smaller variance: easier to meet the need of most viewers Small variance Large variance 22

Extension from Image to Video Divide a video sequence into smaller intervals that have similar content Apply the JND idea to each interval The JND becomes a time-varying function

24 SQF for FoxBird of 5-Second Length

MCL-JCV Dataset Media Communications Lab (MCL) JND-based Compressed Video (JCV) Dataset 30 video clips Duration: 5 seconds Frame Rate: 24, 25 or 30 fps Resolution: 1920x1080 Each was viewed by 50 subjects All data have been collected. They are under analysis now and will be released soon

Video Sources Representative thumbnail frames of 30 selected source sequences

Stair Quality Function (SQF) and Rate Control Rate control via satisfied user ratio (SUR) Coding bit rate is determined by perception experience of a large number of audience (rather than several gold eyes) Choose the proper rate so that X% cannot see the difference while (100-X)% can -> X% is called the satisfied user ratio (SUR) 27

From PSNR to SUR What is the desired bit rate for certain content within a short interval? We can use the measured 1 st JND statistics to answer this question The 1 st JND point -> from perceptually lossless to perceptually lossy

Histogram of 1st JND for Kimono raw samples cleaned samples (same as raw, no outliers)

Video Source: Kimono (slide 26) What is the desired bit rate for Kimono? 27 Mbps provides a good tradeoff between the SUR and bandwidth requirement from the right figure No such information is available from the left figure

Kimono 85Mbps

Kimono 20Mbps

Histogram of 1st JND for CarFirework Hist of raw samples Hist of cleaned samples (1 outlier was removed)

Video Source: CarFirework What is the desired bit rate for CarFirework? 19 Mbps provides a good tradeoff between the SUR (80%) and bandwidth

CarFirework 44Mbps

CarFirework 19Mbps

Histogram of 1st JND for FoxBird Hist of raw samples Hist of cleaned samples (2 outliers were removed)

Video Source: FoxBird What is the desired bit rate for FoxBird? 2.6 Mbps provides a good tradeoff between the SUR (85%) and bandwidth The SUR vs BR is highly content dependent

FoxBird 4.1Mbps

FoxBird 2.6Mbps

Prediction of the 1 st JND Distribution Linking perceptual video coding to big data and machine learning Feature extraction Machine learning Need a training dataset Diversified content 41

Completely New Ball Game Insufficiency of rate-distortion theory Source coding theory or joint source-channel coding theory does not take human perception into account Human perception is a statistical phenomenon Varying between individuals HVS is a nonlinear system The system fires only when the signal is above a certain threshold Foundation of JND 42

Future R&D (1) Prediction of SQF for arbitrary image/video content Machine learning Training: demanding ground truth data (MCL-JCI and MCL-JCV) The first JND point is the most critical one Mean (location) and spread (standard deviation) Time varying: demanding dynamic prediction Rate control can be done based on the satisfied user ratio (SUR) criterion Develop JND-based perceptual coder Push the first JND point as far as possible (smaller QF or larger QP) May lead to the next generation video coding standard 43

Future R&D (2) Other JND-based QoE Applications Image/video super-resolution Image/video retargeting Image/video content delivery 44

Acknowledgements USC Haiqiang Wang Sudeng Hu Joe Yuchieh Lin Lina Jin Netflix Ioannis Katsavounidis Anne Aaron David Ronca

Related Publications MCL-JCV dataset Haiqiang Wang, Weihao Gan, Sudeng Hu, Joe Yuchieh Lin, Lina Jin, Longguang Song, Ping Wang, Ioannis Katsavounidis, Anne Aaron and C.-C. Jay Kuo, MCL-JCV: a JND-based H.264/AVC video quality assessment dataset, IEEE ICIP, Phoenix, Arizona, USA, September 25-28, 2016. GMM-based stair quality model Sudeng Hu, Haiqiang Wang and C.-C. Jay Kuo, A GMM-based stair quality model for human perceived JPEG images, IEEE ICASSP, Shanghai, China, March 20-25, 2016. MCL-JCI dataset Lina Jin, Joe Yuchieh Lin, Sudeng Hu, Haiqiang Wang, Ping Wang, Ioannis Katsavounidis, Anne Aaron and C.-C. Jay Kuo. Statistical Study on Perceived JPEG Image Quality via MCL-JCI Dataset Construction and Analysis, IS&T Conference on Human Vision and Electronic Imaging (HVEI), San Francisco, CA, USA, February 14-18, 2016. JND-based quality measure of coded image/video Joe Yuchieh Lin, Lina Jin, Sudeng Hu, Ioannis Katsavounidis, Anne Aaron and C.-C. Jay Kuo. Experimental Design and Analysis of JND Test on Coded Image/Video. SPIE Optical Engineering+ Applications. International Society for Optics and Photonics, August 12, 2015.