Distributed Multimedia Systems. 2.Coding. László Böszörményi Distributed Multimedia Systems Coding - 1

Similar documents
Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Chapter 2 Introduction to

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Video coding standards

Overview: Video Coding Standards

Motion Video Compression

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

The H.26L Video Coding Project

An Overview of Video Coding Algorithms

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Multimedia Communications. Video compression

Chapter 10 Basic Video Compression Techniques

Video 1 Video October 16, 2001

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Multimedia Communications. Image and Video compression

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Advanced Computer Networks

complex than coding of interlaced data. This is a significant component of the reduced complexity of AVS coding.

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Principles of Video Compression

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

ITU-T Video Coding Standards

Video Compression - From Concepts to the H.264/AVC Standard

MPEG-2. ISO/IEC (or ITU-T H.262)

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Digital Image Processing

The H.263+ Video Coding Standard: Complexity and Performance

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

Implementation of an MPEG Codec on the Tilera TM 64 Processor

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

COMP 9519: Tutorial 1

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

Digital Video Telemetry System

A Study on AVS-M video standard

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Lecture 2 Video Formation and Representation

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Digital Media. Daniel Fuller ITEC 2110

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Lecture 2 Video Formation and Representation

AUDIOVISUAL COMMUNICATION

Lecture 23: Digital Video. The Digital World of Multimedia Guest lecture: Jayson Bowen

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Overview of the H.264/AVC Video Coding Standard

CHROMA CODING IN DISTRIBUTED VIDEO CODING

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Reduced complexity MPEG2 video post-processing for HD display

Video Over Mobile Networks

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

Video Processing Applications Image and Video Processing Dr. Anil Kokaram

Digital Television Fundamentals

Improvement of MPEG-2 Compression by Position-Dependent Encoding

IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Multimedia. Course Code (Fall 2017) Fundamental Concepts in Video

ABSTRACT ERROR CONCEALMENT TECHNIQUES IN H.264/AVC, FOR VIDEO TRANSMISSION OVER WIRELESS NETWORK. Vineeth Shetty Kolkeri, M.S.

MULTIMEDIA TECHNOLOGIES

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Video coding. Summary. Visual perception. Hints on video coding. Pag. 1

MPEG-1 and MPEG-2 Digital Video Coding Standards

MPEG-2. Lecture Special Topics in Signal Processing. Multimedia Communications: Coding, Systems, and Networking

Lecture 1: Introduction & Image and Video Coding Techniques (I)

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems

1 Introduction Motivation Modus Operandi Thesis Outline... 2

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

17 October About H.265/HEVC. Things you should know about the new encoding.

Video coding using the H.264/MPEG-4 AVC compression standard

Introduction to image compression

Chapter 2 Video Coding Standards and Video Formats

ITU-T Video Coding Standards H.261 and H.263

CM3106 Solutions. Do not turn this page over until instructed to do so by the Senior Invigilator.

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Novel VLSI Architecture for Quantization and Variable Length Coding for H-264/AVC Video Compression Standard

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

Video Coding IPR Issues

STUDY OF AVS CHINA PART 7 JIBEN PROFILE FOR MOBILE APPLICATIONS

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

Application of SI frames for H.264/AVC Video Streaming over UMTS Networks

A Big Umbrella. Content Creation: produce the media, compress it to a format that is portable/ deliverable

Communication Theory and Engineering

Chapter 3 Fundamental Concepts in Video. 3.1 Types of Video Signals 3.2 Analog Video 3.3 Digital Video

PACKET-SWITCHED networks have become ubiquitous

Rounding Considerations SDTV-HDTV YCbCr Transforms 4:4:4 to 4:2:2 YCbCr Conversion

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

MULTIMEDIA COMPRESSION AND COMMUNICATION

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Multimedia Systems Video I (Basics of Analog and Digital Video) Mahdi Amiri April 2011 Sharif University of Technology

CONTEXT-BASED COMPLEXITY REDUCTION

Part II Video. General Concepts MPEG1 encoding MPEG2 encoding MPEG4 encoding

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

Visual Communication at Limited Colour Display Capability

Transcription:

Distributed Multimedia Systems 2.Coding László Böszörményi Distributed Multimedia Systems Coding - 1

Audio Encoding - Basics Audio (sound) wave One-dimensional acoustic (pressure) wave Causes vibration in the eardrum or in a microphone Frequency range of human ear 20 20.000 Hz (20 KHz) Perception nearly logarithmic, relation of amplitudes A and B is expressed as db = 20 log 10 (A/B) Very low pressure (20 μpascal) Conversation Heavy traffic Rock band Pain threshold 0 db 50 60 db 80 db 120 db 130 db László Böszörményi Distributed Multimedia Systems Coding - 2

Nyquist s Theorem (1924) If the highest frequency of a signal is H [Hz] (filtered at H) then it can be reconstructed by 2H samples/sec If a channel can take V different values, the maximum data rate b NY max of the channel is b NY max = 2H log 2 V [bps] Example H = 3000 Hz (voice grade line) V = 2 (binary signals, log 2 V =1) b NY max = 6000 bps = 6 Kbps Remark Noiseless channels do not exist Higher data rates can be achieved by tricky encoding! László Böszörményi Distributed Multimedia Systems Coding - 3

Shannon s Theorem (1948) If the bandwidth of a noisy channel (subject to, e.g. thermal noise) is H [Hz] and the signal-to-noise ratio (SNR) is S/N The maximum data rate b max SH of the channel b max SH = H log 2 (1+S/N) [bps] If S << N S/N 0 b max SH = 0 (log 2 (1) = 0) This result is independent of signal levels / encoding! Example (cont d): H = 3.000 Hz S/N = 1.000 (i.e. SNR = 30 db) SNR = 10 log 10 (S/N) [db]) (log 10 (10 3 ) = 3) b max SH 3.000 * log 2 (1+1000) 30.000 bps (log 2 (1024)=10) László Böszörményi Distributed Multimedia Systems Coding - 4

Analog Digital Conversion (ADC) Sampling of the audio wave in every ΔT secs If the sound wave is a linear superposition of noiseless sine waves, with a maximum frequency f : Sampling rate = 2f, more is useless: Nyquist theorem E.g. CDs are sampled with 44.1 KHz 2 * 20 KHz Channels with noise (Shannon thereom) Sampling rate = Bandwidth * log 2 (1+Signal/Noise) Quantization Precision of the digital sample depends on the number of bits used Quantization noise Error due to finite number of bits/sample László Böszörményi Distributed Multimedia Systems Coding - 5

Analog Digital Conversion (ADC) Sampling of the audio wave in every ΔT secs If the sound wave is a linear superposition of noiseless sine waves, with a maximum frequency f : Sampling rate = 2f, more is useless: Nyquist theorem E.g. CDs are sampled with 44.1 KHz 2 * 20 KHz Channels with noise (Shannon thereom) Sampling rate = Bandwidth * log 2 (1+Signal/Noise) Quantization Precision of the digital sample depends on the number of bits Quantization noise Error due to finite number of bits/sample László Böszörményi Distributed Multimedia Systems Coding - 6

Audio Encoding - Example A sine wave Sampling the sine wave (32 different values need 5 bits) Quantizing the samples to 4 bits: 16 different values possible László Böszörményi Distributed Multimedia Systems Coding - 7

Audio Encoding - Standards Telephone 8.000 samples /sec (up to 4 KHz) Needs 64 Kb/s (Pulse code modulation, PCM, 8-bit samples in Europe), or 56 Kb/s (USA, Japan 7 bits) Enhancements: Differential PCM, Adaptive DPCM Audio CDs 44.100 samples /sec (up to 20 KHz) 16-bit samples: quantization error is small but audible (the dynamic range of the ear is ca. 1 million) Needs 705.6 Kb/s for mono, 1.411 Mb/s for stereo MP-3 (MPEG-1 audio layer 3) compression Based on psycho acoustic models (128 Kb/s) László Böszörményi Distributed Multimedia Systems Coding - 8

Analog Video - Basics Sequence of images flashing faster than 50/sec Makes the impression of continuous movie TV (black-and-white) An electron beam scans rapidly the image From left to right and from top to bottom At the end of the scan (a frame) the scan retraces NTSC scans 525 lines (483 effective), 30 frames/sec PAL and SECAM: 625 lines (576), 25 frames/sec 25 frames/s produce smooth motion, but flicker Interlacing solves this 50 half frames (fields) / sec Non interlaced: progressive scanning László Böszörményi Distributed Multimedia Systems Coding - 9

Analog Video - Example László Böszörményi Distributed Multimedia Systems Coding - 10

Analog Video color 3 beams for the 3 additive primary colors Red, green, blue (RGB) RGB to YUV (Similar in NTSC: YIQ) Blue color difference The eye is more sensitive for luminance Luminance (brightness), signal (or channel) Y Chrominance (color), signals U and V, less resolution Y = 0.30R + 0.59G + 0.11B U = (B Y) * 0.493 V = (R Y) * 0.877 4:4:4: Y,U,V same resolution 4:2:2: U,V half horizontal resol. 4:2:0: U,V half hor.+vert. resol. 3*16*16 / (16*16+2*8*8) = 2 Red color difference László Böszörményi Distributed Multimedia Systems Coding - 11

Digital Video Encoding Simplest representation Rectangular grid of picture elements, pixels 3 * 8 bits (for RGB) means 16 million colors more than enough Smoothness vs. flickering Smoothness: number of different images (> 25/s) Flicker: number of refresh of the display (> 70/s) Good computer monitors rescan with > 70Hz The image is repainted from RAM no interlace Common configurations (4:3 aspect ratio) VGA (640*480), SVGA (800*600), XGA (1024*768) XGA needs 472 Mbps/s (at 25 frame/s) László Böszörményi Distributed Multimedia Systems Coding - 12

Picture Compression, JPEG (1) Source image JPEG compression DCT Quantization Entropy c. Compressed image Encoding (at source) 1. Block preparation, RGB YUV, or YIQ Squared blocks of 4 pixels in U, V are averaged (wins 1:2) 128 is extracted from each element: 0 in the middle Each matrix is divided into 8 * 8 pixel blocks 2. Discrete Cosine Transformation (DCT) to each block Converts the 64 signals in the spatial dimensions into 64 spatial frequencies similar to fast Fourier transformation Energy compaction and decorrelation effects DCT(0, 0): DC value, the average of the 64 input signals For a homogenous image all other (AC) values were = 0 Principally without loss apart from rounding errors László Böszörményi Distributed Multimedia Systems Coding - 13

Forward and Inverse DCT with for otherw. with for otherw. László Böszörményi Distributed Multimedia Systems Coding - 14

Picture Compression, JPEG (2) 3. Quantization is lossy, wins bits Divides each DCT coefficient by a weight A set of standardized weight tables 4. Differential coding of the DC values 5. Linearization and run-length coding Zig zag scan, to get long runs of 0 6. Huffman (variable bit length) coding Frequent codes with less bits Decoding (at destination) Applies the same steps in reverse order, Decoding takes ca. the same time roughly symmetric Ca. 20:1 compression effect, with loss Entropy Coding, lossless László Böszörményi Distributed Multimedia Systems Coding - 15

Entropy and Huffman Coding A measure of the uncertainty about the next code to come out of a coder Low when we are pretty sure, high otherwise Maximum when all probabilities are equal Entropy = p n log 2 (1/ p n ) n (Shannon) Huffman: Create code symbols based on the probability of each symbols occurrence Code length is variable, shorter codes for common symbols László Böszörményi Distributed Multimedia Systems Coding - 16

Picture Compression, JPEG (3) RGB input data and block preparation László Böszörményi Distributed Multimedia Systems Coding - 17

Picture Compression, JPEG (4) One block of the Y matrix and the DCT coefficients László Böszörményi Distributed Multimedia Systems Coding - 18

Picture Compression, JPEG (4) Computation of the quantized DCT coefficients László Böszörményi Distributed Multimedia Systems Coding - 19

Picture Compression, JPEG (5) Order of quantized values when transmitted László Böszörményi Distributed Multimedia Systems Coding - 20

DCT-effect Example László Böszörményi Distributed Multimedia Systems Coding - 21

Video Compression DV camcorders often use JPEG-like compr. No time for sophisticated compressions Compression in the time domain Difference between consecutive frames is often small Remove inter-frame redundancy Sophisticated encoding, (relatively) fast decoding László Böszörményi Distributed Multimedia Systems Coding - 22

Residual Frame The residual frame is typically much simpler Also called DFD (displaced frame difference) Residual image, not motion compensated yet László Böszörményi Distributed Multimedia Systems Coding - 23

Basic Idea Parts (blocks) of the input are matched against the referenced frames Costs CPU and memory (for the referenced frames) The size of the search area is not standardized If there is a good match (blocks are similar ) Only the motion vector + difference must be encoded If no good match: block encoded as usual Previous (reference) Frame Current (input) Frame László Böszörményi Distributed Multimedia Systems Coding - 24

Motion Estimation and Compensation Motion Estimation Creates a model of the current frame usually via blockmatching based on reference frames (past + future) Goal Accurate model minimal energy Acceptable computation Motion Compensation Subtract model from the frame usually block-based Produce a motion compensated residual frame This is coded along with the motion vectors The encoded residual is decoded and reconstructed to the decoded frame The reconstructed frame is stored as reference László Böszörményi Distributed Multimedia Systems Coding - 25

Estimation + Compensation Diagram Current frame Reference Reference Frame(s) frame(s) Reconstructed frame Motion compensation Model Motion estimation Model Reconstruction Encode residual Motion vectors Decode residual Quality as seen at the decoder László Böszörményi Distributed Multimedia Systems Coding - 26

Motion Estimation Example frame 1 frame 2 motion vectors László Böszörményi Distributed Multimedia Systems Coding - 27

Block Matching Usual standards (MPEG-1,2,4, H.261,3) use it For each block of luminance (e.g. 16*16) in the current frame the best match is searched for in the reference frame Search area centered around current block The best match provides minimum energy E.g. Mean Squared Error (MSE) N-1 Σ MSE = (Current i,j -Ref i,j ) 2 / N 2 i=0 N-1 Σ i=0 N-1 Sum of Abs. Errors.: SAE = Current i,j Ref i,j László Böszörményi Distributed Multimedia Systems Coding - 28 Σ i=0 N-1 Σ i=0

Block Matching Example 1 3 2 6 4 3 5 4 3 Current block (-1,1) (0,1) 1 3 2 4 5 6 4 2 3 2 5 4 2 2 3 4 4 3 3 1 4 6 7 4 5 (1,1) (-1,0) (1,0) (-1,-1) (1,-1) (0,-1) Reference area MSE for middle position (0, 0): ( (1-4) 2 + (3-2) 2 + (2-3) 2 + (6-4) 2 + (4-2) 2 + (3-2) 2 + (5-4) 2 + (4-3) 2 + (3-3) 2 ) / 9 = 2.44 Motion vector (x,y) -1,-1 0,-1 1,-1-1,0 0,0 1,0-1,1 0,1 1,1 MSE 4.67 2.89 2.78 3.22 2.44 3.33 0.22 2.56 5.33 László Böszörményi Distributed Multimedia Systems Coding - 29

Encoder + Decoder Steps Encoder 1. Calculate energy difference between current block and neighboring regions of the reference 2. Select the best matching (lowest error) region 3. Subtract matching region from current block 4. Encode and transmit the difference block 5. Encode and transmit the motion vector Decoder 1. Decode the difference block and the motion vector 2. Add the difference to the matching region in the reference frame (pointed to by the motion vector) László Böszörményi Distributed Multimedia Systems Coding - 30

Full search Search Window and Order Search is restricted to a limited area: search window Inside SW comparison at all possible positions Search order Relevant with early termination (totalsae > minsae) Raster order Spiral order: finds best match sooner László Böszörményi Distributed Multimedia Systems Coding - 31

Fast not full Search (TSS) Three-Step (N-Step) Search Search window size = 2 N -1 1. Search location (0,0) 2. S:= 2 N-1 (step size) 3. Search 8 locations ±S pixels around (0,0) 4. Make the location with smallest SAE new origin 5. S:= S / 2 6. Repeat 3-5 until S=1 1 1 1 3 3 3 2 2 3 2 3 3 3 3 1 1 2 1 2 2 2 2 1 1 1 TSS with N = 3; (S = 4, 2, 1) FS algorithms may miss the global minimum László Böszörményi Distributed Multimedia Systems Coding - 32

Fast not full Search (NNS) Nearest Neighbors Search Motion vectors are predicted from neighboring ones Neighboring MVs often similar Difference has small values H-263 and MPEG-4 use median of 3 previous motion vectors 1. Search location (0,0) 2. Set search origin to the predicted vector location 3. Search 4 neighboring positions (+ shape) 4. If search origin is best, of edge of SW reached: stop; otherwise set the best match as new origin 5. Go to 3 3 3 2 1 2 1 1 1 2 1 NNS with median predicted vector (-4,3) 0 László Böszörményi Distributed Multimedia Systems Coding - 33

The Moving Pictures Experts Group ISO/IEC JTC1 SC29 WG11 better known as MPEG Traditional MPEG Subgroups for Requirements Audio Video Systems Test Implementation Studies Liaison HoD Head of Delegation New for MPEG-4 SNHC Coding of hybrid natural/synthetic objects New for MPEG-7 MDS Multimedia Description Schemes László Böszörményi Distributed Multimedia Systems Coding - 34

The MPEG Standardization Process 1. Requirements 2. Call for proposals 3. Evaluation 4. Core experiments 5. Draft specification 6. National bodies agree Practically speaking Initial requirements may need revising Core experiments are done in multiple rounds Spec drafting & NB agreement in multiple rounds László Böszörményi Distributed Multimedia Systems Coding - 35

Stages of Standardization To obtain an International Standard WD Working Draft CD Committee Draft FCD Final CD DIS Draft IS FDIS Final DIS IS International Standard To amend an International Standard PDAM Proposed Draft Amendment FPDAM Final PDAM DAM Draft Amendment Amd Amendment László Böszörményi Distributed Multimedia Systems Coding - 36

MPEG-1, MPEG-2, H.261 In the MPEG family, MPEG-1 and MPEG-2 concentrate on compression see others later H.261 (H.263, H.264): similar standards of CCITT (ITU) for video conferencing MPEG-1 and MPEG-2 are asymmetrical and use both intra- and inter-frame coding Standard Compression Bandwidth MPEG-1 < 200:1 1.5 Mbps Usage CD-ROM MPEG-2 < 200:1 4 Mbps Digital TV H.261 100:1 2.000:1 p * 64 Kbps Video conferencing (ISDN) Videophone László Böszörményi Distributed Multimedia Systems Coding - 37

MPEG Data Layers Handles data buffering, describes Bit rate + min. decoder storage 16*16 pixels luminance + 2 * (8*8) pixels chrominance = 4 * (8*8) luminance + 2 * (8*8) chrominance László Böszörményi Distributed Multimedia Systems Coding - 38

MPEG frame kinds I (Intracoded) frames Self-contained JPEG-like pictures (DCT on 8*8 blocks) They must be sent periodically for synchronization P (Predictive) frames Block-by-block difference with last I/P frame Builds motion vector, based on Macro Blocks (MB) B (Bi-directional) frames Differences with last and next frame I/P frame Previously hidden areas can be better predicted from next frame D (DC coefficients only) frames for fast forward GOP (Group of Picture) Typically 15 frames, with 2 prediction levels László Böszörményi Distributed Multimedia Systems Coding - 39

MPEG Compression Input stream Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Frame 7 MPEG compression Decoding, transmission and presentation order may differ, e.g. IPBB.. Forward prediction Compressed stream I frame B frame B frame P frame B frame B frame I frame Bidirectional prediction GOP (Group of Picture) László Böszörményi Distributed Multimedia Systems Coding - 40

MPEG Video Encoding Scheme Depending on I or P frames Video input + + + IDCT DCT Inverse Quantizer Quantizer Entropy Encoding Encoded bit stream Buffer Motion Estimator Motion Compensator Referenced Frames (I or P) László Böszörményi Distributed Multimedia Systems Coding - 41

MPEG Video Decoding Scheme Encoded bit stream Buffer Entropy Decoding Inverse Quantizer IDCT + + Video output Motion Compensator Referenced Frames László Böszörményi Distributed Multimedia Systems Coding - 42

MPEG-1 Audio Encoding Compatible with CD-DA and DAT (Tape) Three quality layers Transformation into the frequency domain Fast Fourier Transformation (FFT) 32 non-overlapping subbands Quantization may be different Controlled by the psychoacustic model Low noise: finer quantization High noise: coareser quantization Psychoacustic Model controls Division in 32 Frequency Bands 32... Quantization Entropy Coding László Böszörményi Distributed Multimedia Systems Coding - 43

MPEG-1 Decoder Architecture Demultiplexes audio and video Synchronization by Time Stamps László Böszörményi Distributed Multimedia Systems Coding - 44

New concepts in H.264/AVC (1) Several small changes rather than one big change New structure elements Frames can be separated into slices Macroblocks can be partitioned Intra prediction Macroblocks may be predicted (from already encoded, decoded and reconstructed) samples within the same frame Otherwise handled similarly to inter-prediction: residual + motion New concepts for inter prediction Uses quarter pixel estimation (1/4 pixel) for luma Motion vectors are predicted as well One picture can have several reference pictures from a large set New concepts for B-Macroblocks A B-MB may refer two reference blocks (averaging) László Böszörményi Distributed Multimedia Systems Coding - 45

New concepts in H.264/AVC (2) Skipped macroblocks No residual is stored for a skipped macroblock Nevertheless, motion vectors are predicted for a skipped block Transformation process has slightly changed Is processed on 4x4 blocks Uses an integer transformation, similar to DCT, but faster Variable quantization Values of quantization table may change from block to block Extended entropy coding Two advanced methods for entropy coding (CAVLC/CABAC) Deblocking filter Avoids blocking-artifacts by applying a blur filter Switching slices Can be used for stream switching, e.g. for fast-forward Interlacing video (for TV) László Böszörményi Distributed Multimedia Systems Coding - 46

H.264/AVC Profiles H.264 PROFILES Baseline Main Extended High I Slices P Slices B Slices SI Slices SP Slices CAVLC CABAC Slice Groups Arbitrary Slice Order Redundant Slices Data Partitioning Weighted Prediction Interlace (Fields) Different Bit-Depth Different Chroma Subsampling (other than 4:2:0) 8x8 Transformation Scaling List in SPS László Böszörményi Distributed Multimedia Systems Coding - 47

Slices Frames are subdivided into slices A slice may contain 1..n macroblocks (MBs) Size of a slice does not need to be constant within a video Slices are independent coding elements Independent coding/decoding of slices Motion estimation only within slice of corresponding MB Slices help to limit the propagation of errors 2 Slices 4 Slices László Böszörményi Distributed Multimedia Systems Coding - 48

I-Slice Contains I-MBs (intra-coded) P-Slice Contains P-MBs and/or I-MBs Slice-Types A P-MB is predicted from one past or one future reference block B-Slice Contains B-MBs and/or I-MBs A B-MB can be predicted from two past or two future reference blocks Not included in the baseline profile SP-Slice (Switching P) and SI-Slice (Switching I) Enables switching between coded streams (see later) László Böszörményi Distributed Multimedia Systems Coding - 49

Macroblock General concept (not new) Contains data of corresponding 16x16 area of a picture Luma Blocks: 16x16 Pixel Chroma Blocks: 8x8 Pixel (for the sub-sampling default: 4:2:0) New concepts P- and B-Macroblocks can be partitioned Large partitions for homogenous areas Small partitions for areas with much information ( high-energy ) Partitions are the fundamental units of prediction I-Macroblocks may use intra-prediction László Böszörményi Distributed Multimedia Systems Coding - 50

Macroblock Partitions 1 macroblock partition of 16*16 luma samples and associated chroma samples 2 macroblock partitions of 16*8 luma samples and associated chroma samples 2 macroblock partitions of 8*16 luma samples and associated chroma samples 4 sub-macroblocks of 8*8 luma samples and associated chroma samples Macroblock partitions 0 0 1 0 1 0 1 2 3 further partitioning of one 8x8 block 1 sub-macroblock partition of 8*8 luma samples and associated chroma samples 2 sub-macroblock partitions of 8*4 luma samples and associated chroma samples 2 sub-macroblock partitions of 4*8 luma samples and associated chroma samples 4 sub-macroblock partitions of 4*4 luma samples and associated chroma samples Sub-macroblock partitions 0 0 0 1 0 1 1 2 3 Figure from: [2] László Böszörményi Distributed Multimedia Systems Coding - 51

Intra Prediction (1) 9 prediction modes for 4x4 luma sub-blocks Depending on 13 neighboring samples (from above, left, left-above and right-above) The mode with min. SAE is selected interpolation example: b c d e j k l m j k l m b c d e (b + c + d+ e + j + k + l + m + 4) / 8 b c d e f g h i b c d e f j x y k z x = (b + 2 c + d) / 4 0: vertical 1: horizontal 2: DC (average) 3: diagonal down left (45 to the left, interpolated) y = z = (d + 2 e + f) / 4 a j k l m b c d e a j k l b c d e a j k l m b c d b c d e f g h j k l m 4: diagonal down right (45 to the right, interpolated) 5: vertical right (26.6 to the left, interpolated) 6: horizontal down (26.6 below horizontal, interpolated) 7: vertical left (26.6 to the right of vert., interpolated) 8: horizontal up (26.6 above horizontal, interpolated) László Böszörményi Distributed Multimedia Systems Coding - 52

Intra Prediction (2) 4 prediction modes for 16x16 luma blocks, depending on 33 neighboring samples 0: vertical 1: horizontal 2: DC (average) 3: plane (diagonal) 4 prediction modes for 8x8 chroma blocks Similar to 16x16 luma modes Signalling the intra prediction modes 9 modes (for 16x16 luma) needed too much bits The mode itself is predicted (at both sides), and only changes to the most probably mode must be coded László Böszörményi Distributed Multimedia Systems Coding - 53

Inter Prediction (1) Tree structured motion compensation Several reference frames can be used Each MB-partition may refer to different frame(s) Each MB-partition has its own motion vector Each frame has one or two reference list(s) Lists contain past or future pictures Short-term (recently coded) pictures (identified by PicNum) Long-term (older) pictures (identifed by LongTermPicNum) Partitions of P-MBs may only refer to list-0 Partitions of B-MBs may refer list-0 and/or list-1 László Böszörményi Distributed Multimedia Systems Coding - 54

Inter Prediction (2) Reference list managment A sliding-window-process (SWP) is used to replace the oldest short-term reference frames The SWP is executed by both the encoder and decoder Lists can also be manipulated by explicit commands (adaptive memory control) Motion vectors May use interpolated pixels Luma: 1/4 pixel accuracy Chroma: 1/8 pixel accuracy Motion vectors are also predicted László Böszörményi Distributed Multimedia Systems Coding - 55

Pixel Interpolation - Luma Interpolation of luma pixel to 1/4 accuracy by 6-tap filter A aa B half-pel vert. pixel interpolation. e.g.: h = ( A 5 C + 20 G + 20 M 5 R + T + 16) / 32 C bb D Figure from: [2] E F G a b c H I J d cc dd h i j k m ee ff n K L M s N P Q R e p f q gg g r S half-pel hor. pixel interpolation. e.g.: b = ( E 5 F + 20 G + 20 H 5 I + J + 16) / 32 qpel pixel interpolation. e.g.: n = ( M + h + 1 ) / 2 T hh U very expensive process! László Böszörményi Distributed Multimedia Systems Coding - 56

Pixel Interpolation - Chroma Interpolation of chroma pixel to 1/8 using a linear scheme Figure from: [2] ( (8 xfracc ) * ( 8 yfracc ) * A + xfracc * ( 8 yfracc ) * B + ( 8 xfracc ) * yfracc * C + xfracc * yfracc * D + 32 ) / 64 László Böszörményi Distributed Multimedia Systems Coding - 57

Idea Motion Vector Prediction (1) Encoding a motion vector (MV) for each partition may cost a lot MVs of neighboring areas often highly correlated MVs are predicted from previously encoded MVs: MVp Only the difference is stored: MVD Motion vector prediction is calculated as the median of motion vectors from the topmost partition of the macroblock to the left (A) leftmost partition of the macroblock above (B) bottom- and leftmost partition of the macroblock right above (C) László Böszörményi Distributed Multimedia Systems Coding - 58

Example Motion Vector Prediction (2) Assume encoder sets motion vector of current macroblock partition to (-12/4) Motion vector prediction from previously decoded motion vectors: x = median (8,16,0) = 8 y = median (4,16,3) = 4 mvp (x/y) = (8/4) Motion vector value: (decoded) (8/4) + (-12/4) = (-4/8) 16/16 B 0/3 C 8/4 Note: Motion vector values are in ¼ pixel units (for luma) A -12/4 already decoded motion vector value (contains mvp) László Böszörményi Distributed Multimedia Systems Coding - 59

B-Macroblocks A B-MB can use as reference (a partition of same size of) one past or one future frame one past and one future frame (Bi-Prediction) two past frames two future frames Prediction modes Motion-compensated Prediction (default for Bi-Prediction) Using average of both reference partitions Two motion vectors Direct Prediction Used for skipped B-MBs No motion vector is stored for a direct predicted B-MB Motion vectors are rather predicted from previous blocks Weighted Prediction Each reference sample is scaled by a weighting factor Can also be applied to P-Macroblocks May be used e.g. when one scene fades into another László Böszörményi Distributed Multimedia Systems Coding - 60

1. DCT-based integer transform Transformation (1) Operates on 4x4 blocks of residual data (1 DC and 15 AC coefficients) Based on matrix multiplications (with weighting) The core part of the transform can be computed only with additions, subtractions and shift operations Inverse DCT-based transform: E i is a predefined weighting matrix required for scaling the coefficients (in order to approximate to DCT) Y = C T i 1 1 0.5 1 1 0.5 ( Y E ) C = [ coeffs] i i 1 1 1 1 0.5 0.5 1 1 1 1 2 a ab 2 a ab ab b b 2 ab 2 a a 2 ab 2 ab ab 1 2 b 1 ab 1 2 b 0.5 1 0.5 1 1 1 0.5 1 1 1 1 1 0.5 Figures from: [1] László Böszörményi Distributed Multimedia Systems Coding - 61

Transformation (2) 2. Hadamard (generalized Fourier) transformation DC coefficients of all 4x4 blocks are grouped together and transformed again with a Hadamard transformation All 16 luma DC coefficients of a 16x16 intra block All 8 (2x4) chroma DC coefficients of an intra block Figures from: [1] DC coefficient (c 00 ) of a 4x4 luma block 00 01 02 03 0 1 4 5 10 11 12 13 2 3 6 7 Inverse Hadamard transform for luma DC coefficients: Number of 4x4 block of residual coefficients (ordered as stored in bitstream) 20 21 22 23 8 9 12 13 30 31 32 33 10 11 14 15 f 1 1 = 1 1 1 1 1 1 1 1 1 1 1 c 1 c 1 c 1 c 00 10 20 30 c c c c 01 11 21 31 c c c c 02 12 22 32 c c c c 03 13 23 33 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 DC coefficients of a both chroma 2x2 blocks 00 01 0 1 10 11 2 3 00 01 0 1 10 11 2 3 László Böszörményi Distributed Multimedia Systems Coding - 62

Quantization Based on quantization tables and scaling lists Basic operation: Z i,j = round(y i,j / Q step ) Quantization parameters (QPs) Control the selection of the quantizer step Q step Higher QP higher values in qt higher quantization worse quality Lower QP lower values in qt lower quantization better quality H.264/AVC defines standard default scaling lists Sequence Parameter Sets Non-default scaling lists can be defined László Böszörményi Distributed Multimedia Systems Coding - 63

Entropy Coding CAVLC Context Adaptive Variable Length Coding Takes advantage of several characteristics of quantized 4x4 blocks, e.g. Uses run length coding Advantageous for long runs of zeros Uses trailing-ones-signaling (how many TOs? which signs?) Because trailing non-zero coeffs. are often -1 or 1 CABAC Context Adaptive Binary Arithmetic Coding Better compression performance than CAVLC Higher complexity of computation (more expensive) Several different probability models (PM) A PM is selected according to the context of the syntax element Uses probability estimation for syntax elements CABAC decoding process covers ~45 pages in the H.264/AVC standard! László Böszörményi Distributed Multimedia Systems Coding - 64

Deblocking Filter To reduce blocking distortion Smoothes block edges Inter prediction uses filtered reference frames i.e. deblocking filter is an integral part of en/decoding Figure from: [3] deblocked picture László Böszörményi Distributed Multimedia Systems Coding - 65

Further Concepts (1) SP / SI Slices Enable switching between different streams (e.g. streams with different bitrates) Instead of inserting I-Slices (as entry points) the encoder creates an SP-Slice that has the same content as e.g. the P-Slice of stream-2: Stream-1 I P P P P P P SP = Stream-2 I P P B P P P Can be decoded by either using previous P or SP Slice Can also be used for random access within one stream (by the decoder) László Böszörményi Distributed Multimedia Systems Coding - 66

Further Concepts (2) Arbitrary Slice Ordering Slices in a coded frame may follow any decoding order E.g. the 1. MB-number of slice-2 may be smaller than the 1. MBnumber of slice-1 Slice Groups ( Flexible Macroblock Ordering ) Can be used for error resilience A macroblock-to-slice-group map defines to which group a MB belongs to i.e. Slices must not contain contiguous macroblocks The standard already predefines some slice-group-maps Interleaved map Dispersed map Foreground- and background-map Redundant Slices Support for interlacing (i.e. fields rather than frames) László Böszörményi Distributed Multimedia Systems Coding - 67

Bitstream Format Bitstream of H.264 is organized in NAL Units (Network Abstraction Layer) A NAL-Unit has a one-byte header which includes the NAL-Unit-type: Sequence Parameter Set Picture Parameter Set Slice Supplemental Enhancement Information (SEI) NAL Header RBSP (raw byte sequence payload) NAL Header RBSP László Böszörményi Distributed Multimedia Systems Coding - 68

Header fields Bitstream Coding Typically Exp-Golomb codes (i.e. variable length codes) leadingzerobits = -1; for( b = 0;!b; leadingzerobits++ ) b = read_bits( 1 ) codenum = 2 leadingzerobits 1 + read_bits( leadingzerobits ) Additional variants used: Signed Exp-Golomb Truncated Exp-Golomb Mapped Exp-Golomb Code is an index of a table Both partners know the table Binary Code Value 1 0 010 1 011 2 00000000100000000 255 00000000111111111 510 László Böszörményi Distributed Multimedia Systems Coding - 69

Performance of H.264/AVC Figure from: [3] László Böszörményi Distributed Multimedia Systems Coding - 70