Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

Similar documents
AV1 Image File Format (AVIF)

AV1 Update. Thomas Daede October 5, Mozilla & The Xiph.Org Foundation

AV1: The Quest is Nearly Complete

Progress in the Alliance for Open Media

Chapter 2 Introduction to

Video coding standards

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Multimedia Communications. Video compression

An Overview of Core Coding Tools in the AV1 Video Codec

Multimedia Communications. Image and Video compression

Overview: Video Coding Standards

arxiv: v2 [cs.mm] 17 Jan 2018

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

The H.263+ Video Coding Standard: Complexity and Performance

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

ITU-T Video Coding Standards

Principles of Video Compression

The H.26L Video Coding Project

Chapter 10 Basic Video Compression Techniques

17 October About H.265/HEVC. Things you should know about the new encoding.

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Video Codec Requirements and Evaluation Methodology

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator

IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

An Overview of Video Coding Algorithms

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Standardized Extensions of High Efficiency Video Coding (HEVC)

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

complex than coding of interlaced data. This is a significant component of the reduced complexity of AVS coding.

AUDIOVISUAL COMMUNICATION

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Advanced Computer Networks

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

THE High Efficiency Video Coding (HEVC) standard is

Video Compression - From Concepts to the H.264/AVC Standard

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Motion Video Compression

Video 1 Video October 16, 2001

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Reduced complexity MPEG2 video post-processing for HD display

Film Grain Technology

MPEG-2. ISO/IEC (or ITU-T H.262)

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

H.264/AVC Baseline Profile Decoder Complexity Analysis

Advanced Video Processing for Future Multimedia Communication Systems

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

FINAL REPORT PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT

A Low-Power 0.7-V H p Video Decoder

4 H.264 Compression: Understanding Profiles and Levels

STUDY OF AVS CHINA PART 7 JIBEN PROFILE FOR MOBILE APPLICATIONS

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Digital Image Processing

INF5080 Multimedia Coding and Transmission Vårsemester 2005, Ifi, UiO. Wavelet Coding & JPEG Wolfgang Leister.

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Implementation of an MPEG Codec on the Tilera TM 64 Processor

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

Versatile Video Coding The Next-Generation Video Standard of the Joint Video Experts Team

COMPLEXITY REDUCTION FOR HEVC INTRAFRAME LUMA MODE DECISION USING IMAGE STATISTICS AND NEURAL NETWORKS.

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Lecture 2 Video Formation and Representation

RECOMMENDATION ITU-R BT.1203 *

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Error concealment techniques in H.264 video transmission over wireless networks

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Essentials of DisplayPort Display Stream Compression (DSC) Protocols

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Decoder Hardware Architecture for HEVC

WITH the demand of higher video quality, lower bit

Visual Communication at Limited Colour Display Capability

INTRA-FRAME WAVELET VIDEO CODING

Project Interim Report

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

WHITE PAPER. Perspectives and Challenges for HEVC Encoding Solutions. Xavier DUCLOUX, December >>

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

MPEG has been established as an international standard

Efficient encoding and delivery of personalized views extracted from panoramic video content

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

Enhanced Frame Buffer Management for HEVC Encoders and Decoders

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

Compressed Domain Video Compositing with HEVC

Video Over Mobile Networks

A Study on AVS-M video standard

Digital Video Telemetry System

CONTEXT-BASED COMPLEXITY REDUCTION

Highly Efficient Video Codec for Entertainment-Quality

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Transcription:

Into the Depths: The Technical Details Behind AV1 Nathan Egge <negge@mozilla.com> Mile High Video Workshop 2018 July 31, 2018

North America Internet Traffic 82% of Internet traffic by 2021 Cisco Study 2

Alliance for Open Media (AOM) Goals of the Alliance: Produce a video codec for a broad set of industry use cases Video on Demand / Streaming Video Conferencing Screen sharing Video game streaming Broadcast Open Source and Royalty Free Widely supported and adopted At least 30% better than current generation video codecs 3

AV1 Coding Tools Overview New high-level syntax Easily parsed sequence header, frame header, tile header, etc New adaptive multi-symbol entropy coding Up to 16 possible values per symbol New coefficient coder LV-MAP exploits multi-symbol arithmetic coder More block sizes Prediction blocks from 128x128 down to 4x4 Rectangular blocks 1:2 and 2:1 ratios (4x8, 8x4, etc) 1:4 and 4:1 ratios (4x16, 16x4, etc) Transform sizes from 64x64 down to 4x4 Includes rectangular transforms 1:2, 2:1 and 1:4, 4:1 ratios More transform types 16 possible transform types Row and column chosen from: IDTX, DCT, DST, ADST More references Up to 7 per frame (out of a store of 8) Spatial and temporal scalability Lossless mode Chroma subsampling 4:4:4, 4:2:2, 4:2:0, monochrome More prediction modes Intra 8 main directions plus delta for up to 56 directions Smooth HV modes interpolate across block Palette mode with index map up to 8 colors Chroma from Luma intra predictor Intra Block Copy Inter Expanded reference list (up to 7 per frame) Allow ZEROMV predictor, which isn t always (0,0) Compound mode Inter-Intra prediction Depends on difference between pixel prediction Smooth blending limited to certain intra modes Wedge codebook (Inter-Inter, or Inter-Intra) Warped motion local affine model with neighbors Global motion affine model across entire frame Loop filtering Deblocking filter Constrained Directional Enhancement Filter Loop restoration Film grain synthesis Full AV1 Specification: https://aomediacodec.github.io/av1-spec/ 4

S I T S I O L T S I G H N T O 0 3 L N O I TO VER!!!! S O E C T U N I M AV1 Coding Tools Overview New high-level syntax Easily parsed sequence header, frame header, tile header, etc New adaptive multi-symbol entropy coding Up to 16 possible values per symbol New coefficient coder LV-MAP exploits multi-symbol arithmetic coder More block sizes Prediction blocks from 128x128 down to 4x4 Rectangular blocks 1:2 and 2:1 ratios (4x8, 8x4, etc) 1:4 and 4:1 ratios (4x16, 16x4, etc) Transform sizes from 64x64 down to 4x4 Includes rectangular transforms 1:2, 2:1 and 1:4, 4:1 ratios More transform types 16 possible transform types Row and column chosen from: IDTX, DCT, DST, ADST More references Up to 7 per frame (out of a store of 8) Spatial and temporal scalability Lossless mode Chroma subsampling 4:4:4, 4:2:2, 4:2:0, monochrome More prediction modes Intra 8 main directions plus delta for up to 56 directions Smooth HV modes interpolate across block Palette mode with index map up to 8 colors Chroma from Luma intra predictor Intra Block Copy Inter Expanded reference list (up to 7 per frame) Allow ZEROMV predictor, which isn t always (0,0) Compound mode Inter-Intra prediction Depends on difference between pixel prediction Smooth blending limited to certain intra modes Wedge codebook (Inter-Inter, or Inter-Intra) Warped motion local affine model with neighbors Global motion affine model across entire frame Loop filtering Deblocking filter Constrained Directional Enhancement Filter Loop restoration Film grain synthesis Full AV1 Specification: https://aomediacodec.github.io/av1-spec/ 5

S I T S I O L T S I G H N T O 0 3 L N O I TO VER!!!! S O E C T LET S U N I M TRY AV1 Coding Tools Overview New high-level syntax Easily parsed sequence header, frame header, tile header, etc New adaptive multi-symbol entropy coding Up to 16 possible values per symbol New coefficient coder LV-MAP exploits multi-symbol arithmetic coder More block sizes Prediction blocks from 128x128 down to 4x4 Rectangular blocks 1:2 and 2:1 ratios (4x8, 8x4, etc) 1:4 and 4:1 ratios (4x16, 16x4, etc) Transform sizes from 64x64 down to 4x4 Includes rectangular transforms 1:2, 2:1 and 1:4, 4:1 ratios More transform types 16 possible transform types Row and column chosen from: IDTX, DCT, DST, ADST More references Up to 7 per frame (out of a store of 8) Spatial and temporal scalability Lossless mode Chroma subsampling 4:4:4, 4:2:2, 4:2:0, monochrome More prediction modes Intra 8 main directions plus delta for up to 56 directions Smooth HV modes interpolate across block Palette mode with index map up to 8 colors Chroma from Luma intra predictor Intra Block Copy Inter Expanded reference list (up to 7 per frame) Allow ZEROMV predictor, which isn t always (0,0) Compound mode Inter-Intra prediction Depends on difference between pixel prediction Smooth blending limited to certain intra modes Wedge codebook (Inter-Inter, or Inter-Intra) Warped motion local affine model with neighbors Global motion affine model across entire frame Loop filtering Deblocking filter Constrained Directional Enhancement Filter Loop restoration Film grain synthesis Full AV1 Specification: https://aomediacodec.github.io/av1-spec/ 6

Profiles Main 8-bit and 10-bit 4:0:0 and 4:2:0 chroma subsampling High 8-bit and 10-bit 4:0:0, 4:2:0 and 4:4:4 chroma subsampling Professional 8-bit, 10-bit and 12-bit 4:0:0, 4:2:0, 4:2:2 and 4:4:4 chroma subsampling 7

Levels For a given sequence, place limits on: frame size (width and height) maximum picture size (area in samples) maximum display rate (samples per second) maximum decode rate (samples per second) average rate (Mbits per second) high rate (Mbits per second) maximum number of tiles maximum number of tile columns 8

High Level Syntax Sequence Header Frame Header Tile Group Tile Tile Tile Group Tile Tile 9

Colors and HDR Colorspace, color matrix, transfer functions, etc. can be encoded directly in the bitstream Chroma siting and levels too HDR metadata can be added through the Metadata OBU syntax 10

Codecs 101 Loop Filter Prediction Transform Quantization Entropy Coding 11

Multi-Symbol Entropy Coder 0/1 Arithmetic Range Coder Code both binary symbols and multi-symbols Alphabet sizes up to 16 Improve EC throughput with high rate streams Instead of 1 bit per cycle, decode up to 4 Use 8x9 -> 17 bit multiples when coding 15-bit CDFs shifted down before multiply Adaptation still occurs with 15-bit precision Fast adaptation mode for first few symbols 0 1 A 0/1 0 1 B 0/1 A 0 B N = 0.4 0 1 C D C D 1 12

Transform Types VP9 has two types: DCT and ADST Chosen independently for horizontal / vertical directions Signaled once per prediction block AV1 has four types: DCT ADST FlipADST (mirror image of ADST) Identity (no transform) Still chosen independently for horizontal / vertical directions Total of 16 possible combinations Not all combinations allowed in all contexts (e.g., no FlipADST for intra) Signaled once per transform block 13

Prediction Block Structure 10 different splitting modes Last (4-way) split is recursive 14

Transform Block Sizes: Intra Signaling mostly unchanged from VP9 One transform size per prediction block For rectangular prediction blocks, largest rectangular transform that fits allowed, e.g., 1:2, 2:1, 4:1 and 1:4 ratio transform blocks Transform sizes go up to 64x64 Only upper left 32x32 region allowed to be non-zero 15

Transform Block Sizes: Inter Signaling completely different from VP9 Four way quad tree splitting For rectangular prediction blocks, largest rectangular transform that fits also allowed Available sizes same as intra 16

Intra Prediction Modes More directional modes 8 main directions plus delta for up to 56 directions Not all modes available at smaller sizes Smooth H + V modes Smoothly interpolate between values in left column (resp. above row) and last value in above row (resp. left column) Paeth predictor mode Palette mode Color index map with up to 8 colors Separate palettes for Y, U and V planes Palette index coded using context model for each pixel in the block Pixels predicted in wavefront order to allow parallel computation Chroma from Luma 17

Chroma from Luma Intra Prediction Predict chroma channel based on decoded luma Encoder signals best correlation constants: αcb and αcr Good for screen content or scenes with fast motion αcr -αcb αcb -αcr 18

Chroma from Luma Algorithm Reconstructed Luma Pixels Subsample Signaled Scaling Factor α (Q3) DC_PRED (Q0) Average Transform-Sized Averages (Q3) Contribution to the AC (in the spatial domain) Scaled Values (Q0) CfL Prediction 19

UV Mode Selection Example (https://goo.gl/6tkab8) CFL_PRED 17% DC_PRED 44.36% TM_PRED 7.98% SMOOTH_PRED 4.85% Ohashi0806shield.y4m QP = 55 20

Awesome for Gaming (Twitch dataset) BD-Rate (%) PSNR Average PSNR-HVS -1.01-0.93 SSIM CIEDE20001-0.90-5.74 PSNR Cb -15.55 PSNR Cr MS SSIM -9.88-0.81 https://arewecompressedyet.com/?job=no-cfl-twitch-cpu2-60frames%402017-09-18t15%3a39%3a17.543z&job=cfl-inter-twitch-cpu2-60frames%402017-09-18t15%3a40%3a24.181z Notable Mentions BD-Rate (%) PSNR Minecraft GTA V Starcraft PSNR-HVS -3.76-1.11-1.41-3.13-1.11-1.43 SSIM -3.68-1.01-1.38 CIEDE20001-20.69-5.88-4.15 PSNR Cb -31.44-15.39-6.18 PSNR Cr MS SSIM -25.54-5.57-6.21-3.28-1.04-1.43 Minecraft GTA V Starcraft MINECRAFT_10_120f.y4m GTAV_0_120f.y4m STARCRAFT_10_120f.y4m 21

Motion Vector Coding Each frame has a list of 7 previous frames to reference (out of a pool of 8) Can reference non-displayed frames, so many possible structures Construct list of top 4 MVs for a given reference / reference pair from neighboring area Complicated entropy coding scheme 22

Compound Prediction (½, ½) weights like VP9 Inter-inter compound segment Pixel weights depend on difference between prediction pixels Inter-intra gradual weighting Smoothly blends from inter to intra prediction Only a limited set of intra modes allowed (DC, H, V, Smooth) Wedge codebook (inter-inter or inter-intra) 23

Global Motion Defines up to a 6-parameter affine model for the whole frame (translation, rotation and scaling) Blocks can signal to either use the global motion vector or code a motion vector like normal If global motion isn t used, default is 0,0 24

Warped Motion Use neighboring blocks to define same motion model within a block Decomposed into two shears with limited range Similar complexity to subpel interpolation 25

Segmentation IDs Up to 8 possible segment labels (3 bits) Value set per label, e.g., filter strength, quantizer, reference frame, skip Signaled per prediction block, down to 8x8 Can either predict segment ID temporally or spatially (chosen per frame) Spatial prediction Used to change quantizer/loop filter strength Useful for adaptive quantization, e.g., for activity masking Useful for temporal RDO, e.g., MV-tree Temporal prediction Useful for predicting temporal properties, e.g., skip 26

Deblocking Filter Similar to what is in VP9 Changed the order edges are filtered to make hardware easier More flexible strength signaling Separate H + V strength for luma Separate Cb and Cr strengths for chroma Can be adjusted on a per-super block basis NB: deblocking filter crosses tile boundaries 27

Constrained Directional Enhancement Filter (CDEF) Merge of Daala s directional deringing filter (DERING) and Thor s constrained lowpass filter (CLPF) Both encoder and decoder search for the direction that best matches Primary filter run along direction, and secondary conditional replacement filter run orthogonally Strength is signaled in the bitstream Results exceed both DERING and CLPF alone, as well as applying DERING + CLPF sequentially 28

Loop Restoration Enhanced and simplified loop filters from VP10 Two filter choices per superblock Separable Wiener filter with explicitly coded coefficients Self-guided filter Runs in a separate pass after CDEF Showed best metrics of any approach tested Uses deblocking filter output outside of superblock boundaries to minimize line buffers 29

Spatial and Temporal Scalability Each frame can have a spatial_id and a temporal_id When spatial_id = 0 and temporal_id = 0 it is called a base layer When spatial_id > 0 and temporal_id > 0 it is called an enhancement layer Idea is that decoder will simply display the frames from the highest layer Higher layer frames can reference lower layer frames Designed to be used by a special Selective Forwarding Unit server that hands out the appropriate scalable layer to a client 30

Frame Super-Resolution Not actually super-resolution Instead Code at reduced resolution Run deblocking filter and CDEF, but not Loop Restoration filter Upsample with simple upscaler Run Loop Restoration filter at full resolution Only horizontal resolution reduction allowed Simplifies hardware (no new line buffers) Allows for gradual bitrate scaling 31

Film Grain Synthesis Grain parameters signaled per frame Synthesized film grain applied after decoding (not in loop) Could be applied using GLSL + PRNG based texture 32

AOM Members / Hardware 33

Designed for Hardware Implementations Hardware members involved from the very beginning Feedback incorporated into a number of tools Per symbol probability adaptation Smaller multipliers in entropy coder Single pass bitstream writing Fewer line buffers in CDEF and LR Only allow horizontal scaling for super-resolution 34

AOM Members / Real-Time Conferencing 35

Designed for Low-Latency Per symbol adaptation replaces symbol counts in VP9 Can write bitstream with subframe latency Removed signaling from frame header that forced whole frame buffering 36

Designed for Broadcasters? Decoder rate model Guarantee buffer size Limit the use of alt-ref s to ensure decodability Verifiable (See Annex E of the spec document) Support for AV1 coming to hardware Smart TV s will want to play Netflix, Hulu, YouTube, etc. Start with AV1 in the broadcasting stack Can leverage industry investment in hardware, software, tooling, etc. Easier to expand into streaming market 37

Moscow State University (SSIM - June 2017) http://www.compression.ru/video/codec_comparison/hevc_2017/msu_hevc_comparison_2017_p5_hq_encoders.pdf 38

Facebook Study (April 2018) https://code.fb.com/video-engineering/av1-beats-x264-and-libvpx-vp9-in-practical-use-case/ 39

AV1 Compression History 40

AV1 Complexity History 41

Questions? 42