MPEG-2 Video Compression

Similar documents
Video coding standards

Overview: Video Coding Standards

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

MPEG-2. ISO/IEC (or ITU-T H.262)

Multimedia Communications. Video compression

Motion Video Compression

Multimedia Communications. Image and Video compression

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Chapter 2 Introduction to

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Chapter 10 Basic Video Compression Techniques

An Overview of Video Coding Algorithms

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Video 1 Video October 16, 2001

Advanced Computer Networks

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

ITU-T Video Coding Standards

Lecture 23: Digital Video. The Digital World of Multimedia Guest lecture: Jayson Bowen

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

MPEG-2. Lecture Special Topics in Signal Processing. Multimedia Communications: Coding, Systems, and Networking

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

AUDIOVISUAL COMMUNICATION

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator

The H.26L Video Coding Project

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Principles of Video Compression

Digital Video Telemetry System

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

ATSC Video and Audio Coding

The H.263+ Video Coding Standard: Complexity and Performance

ITU-T Video Coding Standards H.261 and H.263

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

complex than coding of interlaced data. This is a significant component of the reduced complexity of AVS coding.

Lecture 2 Video Formation and Representation

MPEG has been established as an international standard

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Digital Media. Daniel Fuller ITEC 2110

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Video Compression - From Concepts to the H.264/AVC Standard

Improvement of MPEG-2 Compression by Position-Dependent Encoding

ISO/IEC ISO/IEC : 1995 (E) (Title page to be provided by ISO) Recommendation ITU-T H.262 (1995 E)

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

MPEG-1 and MPEG-2 Digital Video Coding Standards

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Implementation of MPEG-2 Trick Modes

Chapter 3 Fundamental Concepts in Video. 3.1 Types of Video Signals 3.2 Analog Video 3.3 Digital Video

Essentials of DisplayPort Display Stream Compression (DSC) Protocols

Lecture 1: Introduction & Image and Video Coding Techniques (I)

Television History. Date / Place E. Nemer - 1

Multimedia. Course Code (Fall 2017) Fundamental Concepts in Video

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

PACKET-SWITCHED networks have become ubiquitous

Part II Video. General Concepts MPEG1 encoding MPEG2 encoding MPEG4 encoding

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Reduced complexity MPEG2 video post-processing for HD display

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

FEC FOR EFFICIENT VIDEO TRANSMISSION OVER CDMA

Information Transmission Chapter 3, image and video

Digital Image Processing

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

To discuss. Types of video signals Analog Video Digital Video. Multimedia Computing (CSIT 410) 2

COMP 9519: Tutorial 1

OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY

Distributed Video Coding Using LDPC Codes for Wireless Video

Video Over Mobile Networks

Rounding Considerations SDTV-HDTV YCbCr Transforms 4:4:4 to 4:2:2 YCbCr Conversion

Distributed Multimedia Systems. 2.Coding. László Böszörményi Distributed Multimedia Systems Coding - 1

Understanding Compression Technologies for HD and Megapixel Surveillance

Visual Communication at Limited Colour Display Capability

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Introduction to image compression

Digital Television Fundamentals

Analysis of a Two Step MPEG Video System

Video Coding IPR Issues

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

Drift Compensation for Reduced Spatial Resolution Transcoding

Chapter 2. Advanced Telecommunications and Signal Processing Program. E. Galarza, Raynard O. Hinds, Eric C. Reed, Lon E. Sun-

New forms of video compression

HEVC: Future Video Encoding Landscape

MULTIMEDIA TECHNOLOGIES

Video coding. Summary. Visual perception. Hints on video coding. Pag. 1

Multimedia Systems Video I (Basics of Analog and Digital Video) Mahdi Amiri April 2011 Sharif University of Technology

5.1 Types of Video Signals. Chapter 5 Fundamental Concepts in Video. Component video

Transitioning from NTSC (analog) to HD Digital Video

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

ESI VLS-2000 Video Line Scaler

Transcription:

MPEG-2 Video Compression November 29, 1999 Michael Isnardi e-mail: misnardi@sarnoff.com Reproduction in any form requires written permission from the. 1

MPEG Video Outline Introduction Video Basics Human Vision Basics Colorimetry Basics Video Compression Basics MPEG-1 Video MPEG-2 Video Rate Control, VBV, Stat Mux Practicing the Art of MPEG ATSC Video Constraints and Extensions 2

Video Basics...dissection of image into scanning lines.. a single scan line Video Video Camera Video Cable waveform of scan line shown Voltage (proportional to brightness) sync and blanking wall Video Monitor wall forehead hair hair active video Time 3

The Scanning Raster Active Video Horizontal Blanking 525 lines (NTSC) 625 lines (PAL- Europe) Vertical Blanking 4

The Progressive Raster y Scan lines viewed edge-on Active Video Note: All scan lines are sampled at each time instant. Vertical Blanking time x Frame Period 5

The Interlaced Raster y Scan lines viewed edge-on Active Video Note: Alternate scan lines are sampled at each time instant. x Vertical Blanking time Frame Period Field Period Nominal Frame Rates: 30 Hz (NTSC), 25 Hz (PAL-Europe) 6

Common Rasters for Video Coding SIF (Source Input Format) 360 pixels 601 720 pixels Active Video 480 lines (NTSC) or 576 lines (PAL) 360 pixels Active Video 240 lines (NTSC) or 288 lines (PAL) Progressive Raster (30 frames/sec NTSC, 25 frames/sec PAL) CIF (Common Intermediate Format) Active Video Interlaced Raster (30 frames/sec NTSC, 25 frames/sec PAL) Luminance values shown. 7 Progressive Raster (30 frames/sec) 288 lines

Why Interlace? Background In 1930 s, interlaced scanning was developed as a bandwidth saving technique. Persistence of vision causes two fields to fuse into single image, without flicker. All broadcasting today uses interlaced scanning. Advantages: High vertical detail retained for still portions of the scene. Drawbacks: Reduced vertical detail for moving areas Flicker at edges of objects (e.g., text), which is why computer industry uses progressive scanning for monitors. More complicated signal processing for resizing, frame rate conversion, etc. 8

Human Vision Basics Human Visual System (HVS) has limitations that can be exploited for video system design: limited response to black-and-white detail even more limited response to color detail image motion appears fluid at rates above 24 Hz foveal flicker not annoying at picture rates above 24 Hz limited ability to track rapidly moving objects insensitivity to noise at object edges in highly detailed areas of a scene in bright areas of a scene immediately after scene changes 9

Colorimetry Basics Color Video Monitor Color Color Video Video Camera Camera R G B RGB RGB to to YC1C2 YC1C2 Y C1 C2 YC1C2 YC1C2 to to RGB RGB R G B gamma-corrected signals transmission channel(s) In broadcast and studio applications, the gamma-corrected RGB taking primaries are transformed to YC1C2 transmission primaries. Y is the luminance (luma) component; C1 and C2 are the chrominance (chroma, or color difference) components. To exploit the HVS reduced spatial response to chroma, C1 and C2 are further bandlimited in spatial frequency compared to Y. The exact transformation matrix is system-dependent. 10

CCIR Rec. 601 Transformation Y 0.30 0.59 0.11 R Cr = 0.50-0.42-0.08 G 601: Cb -0.17-0.33 0.50 B Cr Cb = 0.00 0.71 0.56 0.00 B -Y R -Y R -Y, Cr In 8-bit implementations, Y occupies 220 levels: [16, 235] Cr, Cb occupy 225 levels: [16, 240] B -Y, Cb 11

Video Compression Basics 12

What is Video Compression?...Orange Juice Analogy... Concentrate: Shipped, Stored and Sold OJ H 2 O H 2 O Water is the redundant element. Tastes Like Fresh- Squeezed! Fresh- Squeezed! In video compression, the encoder removes spatial and temporal redundancy; the decoder puts it back in. 13

Video Compression Techniques Remove spatial and temporal redundancy that exist in natural video imagery correlation itself can be removed in a lossless fashion important for medical applications only realizes about 2:1 compression efficiency Exploit limitations in Human Visual System limited luminance and very limited color response reduced sensitivity to noise in high frequencies (e.g., edges of objects) reduced sensitivity to noise in brighter areas goal is to throw away bits in a psychovisually lossless manner can realize 50:1 or more compression efficiency 14

Major Image and Video Compression Technologies DCT Based Int l Standards, Economy of Scale Motion JPEG Studio Applications H.261 Videoconferencing MPEG-1 CD-ROM Multimedia MPEG-2 DTV Broadcast, DVD Subband/Wavelet EZW VLBR and browsing applications Other DVI/Indeo Multimedia Fractal Multimedia DPCM Broadcast Lossless (e.g., special JPEG mode) Medical 15

Evolution of Video Compression Standards JPEG (Joint Photographics Experts Group) - mostly used for coding still images - introduced DCT and Quantization as part of "Tool Kit" - "Motion JPEG" is intra frame only, low compression, and low delay H.261 (px64) - used for video teleconferencing - px64 kbps (p=1,..., 32) - introduced motion compensated DCT (I and P frames) - medium compression, low delay MPEG-1, MPEG-2 - used for digital storage media and broadcast - 1-15+ Mbps - introduced concept of B frames and field modes - high compression, medium delay 16

Coding Efficiency How does one compare the efficiency of various video compression methods? For example, the following video encoders all have the same quality. Which has the best coding efficiency? Which one has the worst? Parameter Coder 1 Coder 2 Coder 3 Coder 4 Image Size (HxV) 720x480 544x480 480x480 1920x1080 Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 Mbps Frame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fps Chroma Format 4:2:0 4:2:2 4:4:4 4:2:0 17

Normalized Bit Rate A meaningful comparative metric is the normalized bit rate, in units of bits/color pixel. Normalized Bit Rate = C R H V F bits/color pixel where C = Chroma Format Factor (C = 1/3 for 4:4:4, 1/2 for 4:2:2, 2/3 for 4:2:0) R = Bit Rate (bits/second) F = Frame Rate (frames/second) H, V = Horizontal and Vertical Size (luma pixels/frame) 18

Coding Efficiency Example Now let s compare the four coders using Normalized Bit Rate: Parameter Coder 1 Coder 2 Coder 3 Coder 4 Image Size (HxV) 720x480 544x480 480x480 1920x1080 Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 Mbps Frame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fps Chroma Format 4:2:0 4:2:2 4:4:4 4:2:0 Norm. Bit Rate 0.39 0.26 0.36 0.20 This coder has the worst coding efficiency. It uses the most bits/pixel. This coder has the best coding efficiency. It uses the fewest bits/pixel. 19

MPEG Video 20

What is MPEG Video? MPEG = Moving Picture Experts Group Part of the International Standards Organization (ISO) Aim was to create the best video compression standards for multimedia and broadcast applications MPEG-1 Video aimed at SIF resolution 352x240, 30 Hz, non-interlaced, 1.5 Mb/s CD-ROM applications MPEG-2 Video aimed at CCIR-601 resolution 720x480, 30 Hz, interlaced, 4-10 Mb/s broadcast applications, including HDTV MPEG-1 and MPEG-2 are International Standards 21

MPEG-2 Video: Background MPEG-2 work started in November, 1991 Standard optimized at NTSC quality CCIR-601 video @ 10 Mbps 39 algorithms competed in subjective tests, some very different from MPEG-1. Large attendance, typically 175-200 participants. More than 75 organizations, including representatives of CE, telco, computer, broadcasting and universities. Design focus on interlaced CCIR-601 (720x480 pixels) video @ 4 to 9 Mbps. Targeted at broadcast and DVD applications. Extensible to lower and higher resolutions 1) downward compatibility with MPEG-1 2) includes support of HDTV formats MPEG-2 Video (ISO/IEC 13818-2) promoted to International Standard in November, 1995. 22

MPEG International Standards MPEG-1 (ISO/IEC 11172) 11172-1: Systems 11172-2: Video 11172-3: Audio 11172-4: Conformance 11172-5: Software MPEG-2 (ISO/IEC 13818) 13818-1: Systems 13818-2: Video 13818-3: Audio 13818-4: Conformance 13818-5: Software 13818-6: Digital Storage Media - Command & Control (DSM-CC) 13818-7: Non-Backward Compatible Audio 13818-9: Real-Time Interface 13818-10: DSM-CC Conformance 23 These standards are available from ISO and ANSI

MPEG-1 vs. MPEG-2 Operating Points Image Size & Frame Rate 1920x1080 30 Hz 1280x720 30 Hz 720x480 30 Hz 360x240 30 Hz MPEG-1 CD-ROM MPEG-2 Standard Definition Broadcast MPEG-2 HDTV Broadcast MPEG-2 Standard Definition Production 5 10 15 20 Bit Rate (Mb/s) 24

MPEG-2: a superset of MPEG-1 MPEG-2 = MPEG-1 Syntax Elements + Interlace Tools + New Syntax Structures + Scalable Modes + Profiles & Levels 25

MPEG-2 Interlace Tools Broadcast video is interlaced MPEG-1 does not handle interlaced video efficiently MPEG-2 adds key interlace tools: Field Picture Structure Field DCT Field Prediction Modes Alternate Zig-Zag Scan 3:2 Pulldown Support Field-Based Pan-and-Scan Support 26

Key Points about MPEG Video MPEG only specifies bitstream syntax and decoding process Encoding algorithms (e.g., Motion Estimation, Rate Control and Mode Decisions) are open to invention and proprietary techniques MPEG is asymmetric in that much less computational power is required in the decoder. Example: SDTV MPEG-2 encode: 20 GIPS SDTV MPEG-2 decode: 600 MIPS 27

MPEG Building Blocks MPEG Syntax Motion Est Motion Comp Rate Control VLC VLD DCT Q Q -1 DCT -1 28

MPEG Video Layers Sequence (Display Order) GOP (Display Order, N=12, M=3) B B I B B P B B P B B P Picture Slice Y Cr Cb Note: Y = Luma Cr = Red-Y Cb = Blue-Y Macroblock 0 1 4 5 2 3 Y Blocks Cr Block Cb Block 29

MPEG Video Layers (cont d) Important syntax elements in each layer: Sequence GOP Picture Slice Macroblock Block Picture Size; Frame Rate Bit Rate; Buffering Requirements Programmable Coding Parameters Random Access Unit SMPTE Time-Code Timing information (buffer fullness, temporal reference), Coding type (I, P, or B) Intra-frame addressing information Coding re-initialization (error resilience) Basic coding structure, Coding method, Motion Vectors, Quantization DCT coefficients 30

Key Concepts For a given bit rate, the following coding parameters greatly affect picture quality: GOP Structure longer GOP s improve picture quality but decrease random access (i.e., lengthen channel change time) dynamic GOP s can be used creatively to handle scene changes and other effects MV Search Range Wider searches are better, but more costly A large search range is a must for fast action (e.g., sports) Rate Control Mode decisions greatly affect number of coded bits Proprietary schemes will continue to dominate 31

Typical MPEG Encoder Structure Re-Sequenced Input Prediction Error Predicted Image DCT Q Quantization Parameters From Rate To VLC Controller Encoder coefficients -1 Q motion vectors Motion Estimator Embedded Decoder inter intra Motion Compensated Prediction "0" Frame Memory 1 Frame Memory 2 Reconstructed Image DCT -1 Motion Vectors 32

Sequence For CD-ROM applications, sequences can be used to indicate relatively long clips (e.g. shots, scenes or entire movies) For broadcast applications, sequence headers are usually sent frequently (e.g., every GOP) so that key bitstream info is obtained at channel changes Video 1 1 GOP Viewer changes channels here... Video 2 SEQ Header + GOP Header + I Frame Pic Header...but decoder must wait until next SEQ header to start decoding 33

MPEG-2 Structures Sequence Structures Progressive Sequences: contain frames pictures Non-Progressive Sequences: may contain frame and field pictures Frame Structures Progressive Frame: its two fields come from same time instant Non-Progressive Frame: its two fields come from different times Picture Structures Frame Picture Field Picture: must occur in pairs; a frame = two field pictures Both frame and field pictures may be used in the same non-progressive sequence. 34

Sequence Types Progressive Frame Picture Non-Progressive Frame Picture Composed of two Field Pictures MPEG-2 allows both Progressive and Non-Progressive Sequences. A Non-Progressive Sequence may contain both Frame Pictures and Field Pictures. 35

Group of Pictures (GOP) Contains three types of pictures: - Intra (I) pictures intraframe-only spatial DCT - Predicted (P) pictures DCT with forward prediction - Bi-directional (B) pictures DCT with bi-directional prediction Forward Prediction I B B P B B P B B P B B I Time Bi-directional Prediction 36

Anchor Pictures I and P pictures stored in two frame buffers in encoder and decoder form the basis for prediction of P and B pictures I B B P B B P B B P B B I Time Anchor Pictures 37

I Pictures DCT coded without reference to any other pictures stored in a frame buffer in encoder and decoder used as basis of prediction for entire GOP I B B P B B P B B P B B I Time I Picture All these P and B pictures depend on the preceding I picture 38

P Pictures DCT coded with reference to the preceding anchor picture stored in a frame buffer in encoder and decoder use forward prediction only Forward Prediction I B B P B B P B B P B B I Time This P picture depends on this I picture This P picture depends on this P picture 39

B Pictures DCT coded with reference to either the preceding anchor picture, the following anchor picture, or both use forward, backward or bi-directional prediction Bi-directional Prediction I B B P B B P B B P B B I Time This B picture depends on this I picture and this P picture This B picture depends on this P picture and this P picture 40

Forward Prediction a forward-predicted macroblock depends on decoded pixels from the immediately preceding anchor picture can be used to code macroblocks in P and B pictures I B B P B B P B B P B B I Time the arrows, as shown, indicate direction of motion if arrows are reversed, they indicate coding dependencies 41

Backward Prediction a backward-predicted macroblock depends on decoded pixels from the immediately following anchor picture can only be used to code macroblocks in B pictures I B B P B B P B B P B B I Time 42

Bi-directional (Interpolated) Prediction a bi-directionally-predicted macroblock depends on decoded pixels from the anchor pictures immediately following and immediately preceding can only be used to code macroblocks in B pictures I B B P B B P B B P B B I Time 43

GOP Rules A GOP must contain at least one I picture This I picture may be followed by any number of I and P pictures Any number of B pictures may occur between anchor pictures, and B pictures may precede the first I picture A GOP, in coding order, must start with an I picture A GOP, in display, order must start with an I or B picture and must end with an I or P picture 44

Regular and Irregular GOP s Regular GOP s are defined by N and M*: N is the I picture interval M is the anchor picture interval. There are M-1 B pictures between anchor pictures Irregular GOP s are not defined by N and M, but are still allowed as long as they follow the GOP Rules. Regular: N=1, M=1 (12 GOP s shown) I I I I I I I I I I I I Regular: N=6, M=2 (2 GOP s shown) B I B P B P B I B P B P Regular: N=12, M=3 (1 GOP shown) B B I B B P B B P B B P Irregular B B I B B B B B P P B P *N and M are not MPEG syntax elements and are not used in any way by the specification. All GOP s in Display Order 45

Closed and Open GOP s Closed GOP s can be decoded independently, without using decoded pictures in previous GOP s. Open GOP s require such pictures to be available. Closed GOP s Regular: N=4, M=2 (3 GOP s shown) B I B P B I B P B I B P Note that first B picture must be restricted to use backward prediction only. Open GOP s Regular: N=4, M=2 (3 GOP s shown) B I B P B I B P B I B P Note that first B picture depends on last anchor picture from previous GOP. 46

GOP Picture Orderings Two Distinct Picture Orderings Display Order (input to encoder, output of decoder) Coding Order (output of encoder, input to decoder) These are different if B frames are present B frames must be reordered so that future anchor pictures are available for prediction. Note that reordering causes DELAY! GOP Display Order Input to Encoder B B I B B P B B P B B P GOP Coding Order Output of Encoder I B B P B B P B B P B B GOP Display Order Output of Decoder B B I B B P B B P B B P 47

Slice Structures A slice is a collection of macroblocks in raster scan order. Restriction on slice sizes: - MPEG-1 has none. Can be single MB or entire picture. - MPEG-2 restricts a slice to be contained within a row of macroblocks MPEG-2 allows gaps between slices in General Slice Structure MPEG-2 defines Restricted Slice Structure, in which no gaps are allowed. This is used in most Profiles and Levels. A B C D E F G H I J K L M N O P Q Example of Restricted Slice Structure R S T U V W X Y Z 48

Chroma Formats and Picture Sizes 4:2:0 (Required in MPEG-1) Y 2Hx2V Cr HxV Cb HxV CD-ROM and Broadcast Apps. 4:2:2 (Option in MPEG-2) Y 2Hx2V Cr Hx2V Cb Hx2V Studio Apps. Y 2Hx2V Cr 2Hx2V Cb 2Hx2V 4:4:4 (Option in MPEG-2) 49

Macroblock Structures 4:2:0 6 Blocks 0 1 2 3 Y 4 Cr 5 Cb MPEG-2 chroma MPEG-1 chroma 4:2:2 8 Blocks 0 1 2 3 Y 4 5 6 7 Cr Cb 4:4:4 12 Blocks 0 1 2 3 Y 4 8 5 9 6 10 7 11 Cr Cb Spatial Sampling Relationship luma chroma 50

Discrete Cosine Transform (DCT) Image Spatial domain 8x8 pixels 8x8 Forward DCT Transform domain 8x8 coefficients 8x8 Inverse DCT Spatial domain 8x8 pixels Reconstructed Image DCT is an orthogonal transformation 2-D DCT is separable in x and y dimensions Has good energy compaction properties Close to Karhunen-Loeve Transform (KLT), which is optimal but depends on image statistics. Efficient hardware realization Theoretically lossless, but slightly lossy in practice due to round off errors 51

Discrete Cosine Transform (cont d) Transforms 8x8 pixel block into 8x8 frequency coefficient matrix Organizes video information in a way that is easy to compress and manipulate DCT applied to Intra blocks as well as motion-compensated blocks DC low horizontal high 255 255 255 255 255 255 255 255 255 187 204 255 255 255 255 255 1105 238 358 158 30-56 -49-31 548-379 -143 19 71 66 32 9 low 255 122 20 102 230 255 255 255 255 153 0 0 35 136 213 255 255 196 0 0 0 0 17 94 8x8 Forward DCT 207 103-171 -81-58 7 24 31-52 162-34 -66-18 -20-20 -21-33 13 71-52 -18-3 9-4 vertical 255 247 43 0 0 0 0 0 11-56 56 23-28 -3-6 1 255 255 82 0 0 0 0 0 255 255 128 0 0 0 0 0-5 -14-11 49-1 -18-9 8-27 9-24 28 34-24 -4 3 high pixels DCT coefficients 52

8x8 Blocks and Their Transforms MPEG Flower Garden Block of 8x8 Pixels Their DCT Coefficients DC Flat Area Vertical Edge Horizontal Edge Single Pixel Diagonal Line 53

DCT and IDCT Formulas f(x,y) y Pixels x 2-D DCT DC Coeff. u F(u,v) v DCT Coefficients AC Coeffs. Forward DCT: F(u,v) = (2/N) C(u) C(v) f(x,y) cos[(2x+1)uπ/2n] cos[(2y+1)vπ/2n] x=0, y=0 Inverse DCT: N-1, N-1 N-1, N-1 f(x,y) = (2/N) C(u) C(v) F(u,v) cos[(2x+1)uπ/2n] cos[(2y+1)vπ/2n] u=0, v=0 where: C(u), C(v) = {1/ 2 for u,v = 0; 1 otherwise} N=8 54

2-D DCT Basis Images 0 0 1 2 3 4 5 6 7 1 2 3 4 v (Vert. Freq.) 5 6 7 u (Horizontal Frequency) 55

Quantization Image DCT Q Quantized coefficients Q -1 DCT -1 Reconstructed Image Quantization can be thought of as dividing each transform coefficient by a frequency-dependent value, and then rounding or truncating to the nearest integer Inverse quantization is like multiplication Quantization coefficients can be tailored to noise sensitivity of Human Visual System Quantization is LOSSY! Reconstructed pixels usually differ in value from original Quantization causes information to be irretrievably lost 56

Quantization Tools Quantization Matrix (QM) 8x8 matrix can be shaped so that coarser quantization of high spatial frequencies occurs coarser quantization of high spatial frequencies saves bits but causes little or no subjective degradation In MPEG-2, up to four QM s (luma intra/non-intra and chroma intra/non-intra) can be changed at the picture rate Default matrices are specified and need not be sent, but different ones can be downloaded Quantizer Scale (QS) QS can change on a macroblock basis rate control s job is to modify QS in a way that keeps picture quality high for a given bit rate 57

MPEG-2 Quantizer Scale Types 120 100 80 Nonlinear Quantizer Scale (q_scale_type = 1) 60 40 Linear Quantizer Scale (q_scale_type = 0) 20 0 0 10 20 30 40 quantizer_scale_code [1, 31] (sent in bitstream) 58

Quantization Example DCT Frequency Coefficients T[u][v] DC 276 59 89 39 7-13 -12-7 137-94 -35 4 17 16 7 2 51 25-42 -20-14 1 5 7-12 40-8 -16-4 -4-5 -5-8 3 17-13 -4 0 2-1 2 14 14 5-7 0-1 0 16 A Quantized DCT Coefficients T [u][v] DC 35 1 2 1 0 0 0 0 3 2-1 0 0 0 0 0 DC -1-3 -2 8 16 19 22 26 27 29 34 16 16 22 24 27 29 34 37 19 22 26 27 29 34 34 38 22 22 26 27 29 34 37 40 22 26 27 29 26 27 29 26 27 29 12 0-4 -2 1-6 2-6 6 8-5 -1 0 32 35 40 48 32 35 40 48 Default Intra Quantization Matrix QM[u][v] 58 34 38 46 56 69 27 29 35 38 46 56 69 83 Pointwise Division and Rounding QS B A/B Quantizer Scale QS = 40 (from Rate Controller) 1 0-1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Note: Quantization of DC term is fixed and does not depend on QM or QS. 59

Default Quantization Matrices DC 8 16 19 22 26 27 29 34 16 16 22 24 27 29 34 37 19 22 26 27 29 34 34 38 22 22 26 27 29 34 37 40 22 26 27 29 32 35 40 48 26 27 29 32 35 40 48 58 26 27 29 34 38 46 56 69 27 29 35 38 46 56 69 83 Intra Matrix: QM I [u][v] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 Non-Intra Matrix: QM N [u][v] Note: AC coefficients (all coefficients except DC) are first multiplied by 16, then divided by QS*QM I [u][v]. Note: All coefficients are first multiplied by 16, then divided by QS*QM N [u][v]. DC term is treated specially. 60

Downloadable Quant Matrices For improved quality in certain coding situations, quantization matrices for Intra and Non-Intra macroblocks can be downloaded. The decoder uses these instead of the defaults (which are not sent in the bitstream) The example at right shows an improved Non-Intra Quant Matrix used by the MPEG-2 Test Model 5 (TM5) 16 17 18 19 20 21 22 23 17 18 19 20 21 22 23 18 19 20 21 22 23 19 20 21 22 23 20 21 22 23 21 22 23 22 23 23 24 24 25 24 26 27 25 26 27 28 24 26 27 28 30 24 26 27 28 30 31 24 25 27 28 30 31 33 Example of Downloadable Matrix (TM5 Non-Intra Matrix) 61

Quant Matrix Effect DC Flat Matrix Reconstruction Levels freq freq Before Quantization After Quantization Reconstruction Levels DC freq freq Before Quantization After Quantization Tilted Matrix 62

Quantization Artifacts Vertical Edge Original 8x8 Block QS = 2 QS = 5 QS = 10 QS = 15 Corner Edge Diagonal Edge Shown after DCT, Quantization, Inverse Quantization and Inverse DCT using default Intra Quantization Matrix and Linear Quantizer Scale 63

Variable Length Coding (VLC) and Decoding (VLD) Q Variable bit rate VLC VLD Q -1-1 DCT -1 Image DCT Reconstructed Image Quantization zeros out many DCT coefficients Zig-Zag scanning of the quantized DCT coefficients yields runs of zeros Non-Zero Levels and Runs of Zeros can be coded efficiently using VLC's VLC causes variable bit rate output! 64

Run Length Coding Zeros of the 8x8 block are run length coded To optimize the runs, the block is zig-zag scanned DC 35 1 2 1 0 0 0 0 3 2-1 0 0 0 0 0 1 0-1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 0, 1 0, 3 0, 1 0, 2 0, 2 0, 1 0, -1 3, 1 0, -1 End of Block DC Coefficients are differenced from block to block and VLC d Common Run/Level Pairs are VLC d Zig-zag scan (MPEG-1 pattern) through quantized DCT coefficients Corresponding Run/Level Pairs 65

MPEG-2 Enhancements Field and Frame Pictures Predicted Image Field & Frame DCT DCT Q Linear & Nonlinear QS Quantization Parameters IDCT Embedded Decoder Frame Mem 1 Motion Motion Estimator Comp + Frame 0 Mem 2 DCT coefficients IQ Alternate Zig-Zag and VLC coding VLC & Bitstream Packer motion vectors Headers MPEG-2 Video Bitstream Field & Frame Prediction Motion Vectors 66

MPEG-2 Zig-Zag Scan Options DC 8x8 Blocks of Quantized DCT Coefficients DC Normal Zig-Zag Scan. Mandatory in MPEG-1. Option in MPEG-2. Alternate Zig-Zag Scan. Not used in MPEG-1. Option in MPEG-2. For Frame DCT coding of interlaced video, more energy exists here, so run length coding is more efficient. 67

MPEG-2 Field/Frame DCT Coding Frame DCT: Normal MPEG-1 mode of coding Field DCT: Split into top and bottom fields MPEG-2 encoder may choose Field DCT on any macroblock. Decoder must interpret coding flag correctly, or severe errors will occur. y Field DCT Coding Luminance Macroblock Frame DCT Coding x Note: Chrominance blocks in 4:2:0 mode are always DCT coded in Frame order 68

Variable Length Coding Huffman type entropy coding Shorter codewords assigned to more probable symbols (like Morse Code) Used for motion vectors, run/level pairs, type of macroblocks, etc. Example: DCT AC coefficients: 0,1 110 1,1 0110 0,-1 111 7,-1 0001001 EOB 10 Example: Vectors delta coded: 0 1 1 010 2 0010 3 00010 4 0000110 5 00001010... 15 000000011010 69

Rate Control Image DCT Rate Controller Constant Bit-Rate Q VLC Buffer Buffer VLD Q -1 DCT -1 Reconstructed Image A buffer is used to smooth out the bit rate Rate controller adjusts quantizer to control buffer fullness and prevent overflow and underflow of decoder s buffer (Video Buffer Verifier) Buffer size affects image quality and overall delay Rate control algorithm is crucial for high quality compression Shown above is basic structure for: - Motion JPEG - Intraframe H.261 - Intraframe MPEG 70

Temporal Prediction Rate Controller Image + - - DCT CBR Q VLC Buf Buf VLD Q -1 DCT -1 + Reconstructed Image Q -1 Frame Delay Predicted Image Frame Delay DCT -1 + To exploit redundancy in still portions of an image sequence, the difference between the input and the reconstructed previous frame is coded Encoder gets more complex and includes copy of decoder (called an embedded decoder) Moving areas are not coded well using this scheme, so MPEG uses Motion Compensated Prediction. 71

Motion Compensated Prediction Residual Image Rate Controller + CBR Image - DCT Q VLC BUF BUF VLD Q -1 DCT -1 - + Reconstructed Image Motion Estimation Predicted Image Q -1 DCT -1 + Motion Compensator Reconstructed Image Motion Vectors Motion Vectors Most motion is predictable, and motion compensation exploits this fact. Motion Compensator Motion Estimation is the process by which motion vectors are computed in the encoder. It can be quite computationally intensive. Motion vectors are used by the Motion Compensators in the encoder and decoder to produce Predicted Images from Reconstructed Images. We now have P frames. 72

A Typical Motion Estimation Architecture input image Coarse ME DCT/Q VLC predicted image MC refined motion vectors coarse motion vectors Q -1 /DCT -1 + recon. image Fine ME Coarse motion vectors computed from input images. Refined motion vectors, e.g., half-pel refinement, computed from reconstructed images. Good compromise between true motion and small error. Used in MPEG-2 Test Model 5. 73

How Does Motion Compensated Prediction Save Bits? F MV F X Current Macroblock Previous I or P Picture Current P or B Picture Instead of sending quantized DCT coefficients of X, send: 1. quantized DCT coefficients of X-F (prediction error). If prediction is good, error will be near zero and will code with fewer bits. 2. MV F, the motion vector. This will be differentially coded with respect to its neighboring vector, and will code efficiently. This will typically result in 50% - 80% savings in bits. 74

Gray-Scale Statistics of Prediction Error One Frame of Original Image Pair Prediction Error 0.02 Histogram 0.25 Histogram 0.018 0.016 0.2 0.014 0.012 0.15 0.01 0.008 0.1 0.006 0.004 0.05 0.002 0-100 -50 0 50 100 150 200 250 300 350 400 0-250 -200-150 -100-50 0 50 100 150 200 250 75

Forward Motion Estimation... used in P and B frames... MB Grid Search Area Position of "zero motion vector" MB (center of search area) Motion Vector (e.g., [-20.5, +20.5]) Position of "best match" MB (to half-pixel accuracy - need not be aligned to MB grid) Position of current Macroblock (aligned to MB grid) Time Previous I or P Picture Current P or B Picture 76

ME Matching Metrics search area i i offset (k,l) X X MB grid j X = 16x16 prediction MB j X = 16x16 current MB Minimum Mean Absolute Error: MMAE = min X - X Minimum Mean Squared Error: MMSE = min (X - X ) 2 k,l i,j k,l i,j 256 256 77

Example of Forward Motion Estimation Case: Good prediction for still objects. Search Area Macroblock Grid Previous I or P Picture. Within the search area, a good match is found for this still object. Current P Picture. Current MB is shown with heavy outline. Since a match is found, this MB is intercoded. 78

Example of Forward Motion Estimation Case: Dealing with featureless regions. Macroblock Grid Search Area Previous I or P Picture. Within the search area, many good matches are found. Encoder must pick one and send appropriate motion vector. Current P Picture. Current MB is shown with heavy outline. Since a match is found, this MB is intercoded. 79

Example of Forward Motion Estimation Case: Good prediction for linearly translating objects. Macroblock Grid Search Area Previous I or P Picture. Within the search area, a good match is found for this moving object. Encoder sends appropriate forward motion vector. Current P Picture. Current MB is shown with heavy outline. Since a match is found, this MB is intercoded. 80

Example of Forward Motion Estimation Case: A good prediction might be missed because it is outside the search area. Macroblock Grid Search Area Previous I or P Picture. Within the search area, no good match is found. Note that a good match would be found with a larger search area. Search area is an important encoder design parameter. Current P Picture. Current MB is shown with heavy outline. Since no match is found, this MB is intracoded. 81

Example of Forward Motion Estimation Case: A good prediction might come from an unrelated object. Macroblock Grid Search Area Previous I or P Picture. Within the search area, a good match is found, but within a different object. There is no requirement that motion vectors represent true motion of objects. Current P Picture. Current MB is shown with heavy outline. Since a match is found, this MB is intercoded. 82

Example of Forward Motion Estimation Case: Prediction Error should have low energy. Macroblock Grid P P P P P P P P P P P P P P P P P P P P P P P I P P P P P P P P P P P P P P P P P P P P P P P P P Previous I or P Picture Current P Picture Prediction Error Picture, with MB Type and Motion Vectors Superimposed. (I = Intra, P = Inter) 83

Example of Backward Motion Estimation Case: Handles uncovered objects missed by forward prediction. Previous I or P Picture. Searching here finds no good match because some features are partially hidden. Current B Picture. Current MB is shown with heavy outline. Next I or P Picture. Searching here finds a good match because features are now uncovered. 84

Forward/Backward/Interpolated Decision...must be made for every non-intra macroblock in a B picture... F MV F X MV B B Previous I or P Picture Current B Picture Next I or P Picture Define: X = Current MB F = Best MB in previous I or P Picture B = Best MB in next I or P Picture MV F = MV corresponding to F s displacement from X MV B = MV corresponding to B s displacement from X Compute: Goodness of F, B and (F+B)/2 as predictors for X Decide: If F is best, send MV F Forward Prediction If B is best, send MV B Backward Prediction If (F+B)/2 is best, send MV F and MV B Interpolated Prediction 85

Motion Vector Coding Example MV Field Motion Vectors (MV s) shown for 8 successive macroblocks. MV x y 3-10 10-10 30-9 30-9 -14-11 -16-11 27-10 24-10 Assume all [x, y] for picture in RANGE [-32, 31] => f_code = 2, MODULUS= 64. MV x y 3-10 7 0 20 1 0 0-44 -2-2 -0 43 1-3 0 MV = Differential MV. [0,0] used as predictor for first MV. MV x y 3-10 7 0 20 1 0 0 20-2 -2-0 -21 1-3 0 Add or subtract MODULUS if out of RANGE. Keeps all values in RANGE. MV VLC x y 0101,000101,00000100100,10,00000100100,0110,00000100110,0110 000010110,10,11,10,0111,10,11,10 Convert to VLC s using table Table 2-B.4 in the MPEG-1 Video spec. VLC s used in this example are for illustration only. Note that the vertical components of the MV s are much more correlated than the horizontal components. Therefore, the MV differentials for the vertical components code with fewer bits. 86

MPEG-2 Prediction Modes Frame Prediction in a frame picture, field prediction or frame prediction is selected on a macroblock basis Field Prediction predictions are made independently for each field in a field picture, all predictions are field predictions Dual Prime can be used in field pictures or frame pictures can only be used in P pictures one MV plus a differential MV sent per macroblock 16x8 Motion Compensation can only be used in field pictures two MV s are sent for forward or backward prediction first MV used for upper 16x8 region, second MV for lower four MV s are sent for bi-directional prediction 87

Allowable MPEG-2 Prediction Modes Frame Pictures Frame Prediction Field Prediction Field Pictures 16x8 Motion Compensation Field Prediction Dual Prime Dual Prime 88

Prediction in Frame Pictures Reference Frame 16x16 Predicted Frame 16x16 Current MB Frame Prediction Best 16x16 region in Reference Picture determines frame MV for 16x16 MB. Only mode allowed in MPEG-1. Top Field Bottom Field 16x8 16x8 16x8 16x8 or or 16x8 Top Field of Current MB 16x8 Bottom Field of Current MB Field Prediction Best 16x8 region in Top or Bottom field in Reference Picture determines field MV s for Top and Bottom portions of 16x16 MB. 89

Dual-Prime Prediction Reference Frame Predicted Frame Top Field Bottom Field 16x8 16x8 16x8 16x8 Average Average 16x8 Top Field of Current MB 16x8 Bottom Field of Current MB In Frame Pictures Single MV (heavy arrow) sent in bitstream; this represents predictions from fields of same parity. Small differential MV s are also sent; these represent offset predictions from fields of opposite parity. Same and opposite field predictions are averaged to form final prediction for each 16x8 region of current MB. First Field Second Field 16x16 16x16 Average 16x16 This Field not yet decoded. In Field Pictures Single MV (heavy arrow) sent in bitstream; this represents prediction from field of same parity. A small differential MV is also sent; this represents an offset prediction from field of opposite parity. Same and opposite field predictions are averaged to form final prediction for current 16x16 MB. 90

Dual-Prime Prediction in V-T Top Bottom Top Bottom Reference Picture Predicted Picture Vector Transmitted in Bitstream for Same Parity Fields Differential Vector Transmitted in Bitstream (limited to values -1, 0, +1) Vector Derived at Decoder for Opposite Parity Fields 91

Concealment Motion Vectors An MPEG-2 enhancement; not a requirement Helps in concealing errors when data is lost Concealment motion vectors (CMV s), if sent, are coded with Intra macroblocks (MB s) CMV s should be used in MB s immediately below the one in which the CMV occurs Group of Intra-coded Macroblocks with CMV s Use CMV s in this row for MB s below Macroblocks in this row are lost 92

Inter/Intra Decision Rate Controller Image + 0 CBR DCT Q VLC BUF - - 1 BUF VLD Q -1 DCT -1 Intra/Inter Mode Motion Estimation Intra/Inter Decider Q -1 Motion Vectors 0 1 0 + Reconstructed Image Motion Compensator 0 0 1 Motion Compensator Motion Vectors DCT -1 + Reconstructed Image On a macroblock basis, decide whether it's more efficient to code original signal or motion compensated prediction error Some pictures are coded entirely intraframe (I-pictures). This is useful for resetting prediction loop and for editing Basic structure of H.261 codec 93

Selection of Macroblock Type...following the MPEG-1 simulation model... 1. MC vs. No MC if Motion Compensation is best, select MC and transmit motion vector(s); if B picture, select forward, backward or interpolated otherwise, select No MC ; do not transmit motion vector; it is assumed to be 0 2. Intra vs. Inter should MV found in step 1 be used? If so, select Inter 3. Coded vs. Not Coded if quantized prediction error is zero, select Not Coded 4. Quant vs. No Quant if quantizer scale needs to be changed, select Quant 94

Example of MB Type Selection for P Pictures Non Intra Coded Quant No Quant pred-mcq pred-mc MC Not Coded pred-m Quant Begin Intra No Quant No MC Quant Coded No Quant Non Intra Not Coded intra-q intra-d pred-cq pred-c skipped 95

Example of MB Type Selection for B Pictures Forward Coded Quant No Quant pred-fcq pred-fc Not Coded pred-f or skipped MC Backward Coded Not Coded Quant No Quant pred-bcq pred-bc pred-b or skipped Begin Interpolated Coded Quant No Quant pred-icq pred-ic Not Coded pred-i or skipped No MC Intra Quant No Quant intra-q intra-d 96

Macroblocks and Quantizer Scale Codes Quantizer Scale Codes are 5-bit integers sent in every slice header and selected MB headers Decoder uses most recent value for all subsequent MB s until another Quantizer Scale Code is encountered. These quant scales coded in bit stream A single MB Slice Header 9 (9) (9) (9) 5 (5) 4 (4) 6 (6) (6) (6) A single slice Decoder uses values shown in parentheses 97

Skipped Macroblocks MB s cannot be skipped in I Pictures MB s can be skipped in P and B pictures if certain rules apply Portion of a P or B Picture A slice The first MB of a slice must be coded The last MB of a slice must be coded These MB s can be skipped if: 1) all quantized DCT coeffs = 0, and 2) all MV s = 0 (in P pictures), or all MV differentials = 0 (in B pictures) 98

Forward Analysis and Resequencing Forward Analysis is a look-ahead technique that can be used to help the Rate Controller adjust quantization in a more optimal fashion R e + 0 Image s DCT Q - e - 1 q Motion Estimation Forward Analyzer Intra/Inter Decider Rate Controller Q -1 DCT -1 CBR VLC BUF BUF VLD Q -1 Intra/Inter Mode DCT -1 Motion Vectors 0 1 0 + R e s e q Motion Compensator Reconstructed Image 0 0 1 Motion Vectors + Motion Compensator Reconstructed Image B frames must be resequenced from display to coding order Basic structure of MPEG codec 99

MPEG Bit Stream Structure Sequence layer GOP layer Picture width Picture height Aspect ratio Bitrate Picture rate... Sequence Header Sequence Sequence Header Sequence GOP Header Picture Header Picture... Picture Header Picture Temporal Reference Picture Type VBV Delay Extension Start Code Picture Structure...... 100

MPEG Bit Stream Structure (Cont'd.) Picture layer Block Macroblock Slice layer Macroblock layer Block Layer Slice Picture Header Slice Header Macroblock... Macroblock Slice Header Macroblock... Address Type Quantizer Scale Motion Vectors Coded Block Pattern Block... Block 101

3:2 Pulldown MPEG-2 provides a mechanism for film-originated content to be coded at 24 frame/sec but displayed at 30 frames/sec The lower frame rate of film means it can be coded at the same quality as 30 frame/sec video, but at a lower bit rate. The repeat_first_field (rff) and top_field_first (tff) flags allow decoders to recreate the 3:2 pulldown sequence for display. rff=1 tff=1 rff=0 tff=0 rff=1 tff=0 1/24 sec rff=0 tff=1 Film Frames coded as progressive frames at 24 frames/sec 3:2 pulldown alternately creates 3 and 2 displayed fields for each input frame repeat first field repeat first field 1/60 sec 1/30 sec 102

Pan-and-Scan MPEG-2 provides a mechanism for panning a display rectangle around a reconstructed frame Horizontal and vertical offsets are specified to 1/16 pixel resolution and can be sent for every displayed field. This allows widescreen material to be viewed on 4:3 displays. 4:3 Display Rectangle 16:9 Reconstructed Frame frame_centre_horizontal_offset In this example the horizontal frame center offset is a positive number. 103

MPEG-2 Video Decoding Process MPEG-2 Bitstream Parsing DCT Coeffs VLD Zig-Zag Scan Mode Quant Scale Factor & Quant Matrices Motion Vectors VLD Inv Scan Q -1 DCT -1 Dual Prime Arithmetic Chroma Scaling Half-Pel Info + Combine Predictions Half-Pel Prediction Filtering Sat. Decoded Pixels Vector Predictors Field/Frame Prediction Selection Framestore Addressing Frame Stores NOTE: This is a simplified, high-level functional diagram that integrates several separate diagrams in the MPEG-2 Video Spec (ISO/IEC 13818-2). 104

Special Topics More About Rate Control The Video Buffer Verifier MPEG-2 Profiles and Levels Statistical Multiplexing Practicing the Art of MPEG 105

Rate-Distortion Curve Rate R 3 As the rate increases, the distortion decreases. For a given distortion, the rate increases with complexity. R 2 R 1 At zero distortion, the source is coded at its entropy, R n. At zero rate, the source is not coded. The distortion is equal to the source energy, n 2. increasing complexity 0 Distortion 1 2 2 2 3 2 106

Distortion and Quant Scale As quant scale increases, so does distortion. For a given quant scale, the distortion generally increases with complexity. n 2 Distortion increasing complexity 1 5 10 15 20 25 Quantizer Scale Code 107

Bit Rate vs.quant Scale R As quant scale decreases, the bit rate increases. 3 For a given quant scale, the bit rate increases with complexity. R 2 R 1 For minimum distortion, use the smallest quant scale. Rate (e.g., bits/ picture) increasing complexity 1 5 10 15 20 25 Quantizer Scale Code 108

Constant Quality Encoding Quant Scale Code 15 For a given picture type (I, P or B), constant quality is achieved with a fixed quant scale. For sequences with mixed picture types, B pictures can be coded with somewhat lower picture quality, since they are not used as the basis for prediction. 10 B B B B B B 5 I P P I Example showing B pictures with higher quant scale (i.e., lower quality). frames (display order) 109

Constant Quality => VBR With a fixed quant scale, the bit rate increases with complexity. This implies variable bit rate (VBR) encoding. Bits/ Picture (kbits) Constant Quality Encoding for All I-Frame Sequence - Fixed Quant Scale - 500 300 100 I I I I I I I I I simple scene moderately complex scene complex scene frames (display order) 110

CBR => Variable Quality For many applications, constant bit rate (CBR) encoding is required. This can lead to highly variable image quality. Bits/ Picture (kbits) 500 these pictures need more bits (lower quant scale or add stuffing) these pictures are just about right these pictures need fewer bits (increase quant scale) 300 100 I I simple scene I I I I moderately complex scene 300 kbit/picture (CBR) Encoding for All I-Frame Sequence - Variable Quant Scale - I I complex scene I frames (display order) 111

CBR Rate Control Goal is to achieve high quality at constant bit rate. To achieve a constant bit rate, a buffer is used to smooth out high variability in bits/frame. In practice, I frames are often given highest quality, since they form the basis of prediction for all other pictures in the GOP. As complexity increases, the quant scale, on average, is increased to avoid buffer overflow. To approach constant quality from frame to frame, bits are stolen from simple frames and given to complex frames. To approach constant quality within a frame, bits are stolen from simple areas and given to complex areas. 112

What is the Video Buffer Verifier (VBV)? The VBV is a hypothetical input rate buffer for the video decoder, which is connected to the output of an encoder. The encoder keeps track of the VBV fullness, and must ensure that it does not overflow or underflow. Assuming constant end-to-end delay, the encoder buffer is the mirror image of the VBV. VBV Video MPEG Video Bitstream Video Output Rate Buffer Input Rate Buffer (VBV) MPEG Encoder MPEG Decoder 113

MPEG's Video Buffer Verifier Water Tank Analogy (Normal Operation) B Tank Fullness Constant Flow B2 Volume of water (B2-B1) is extracted instantaneously every T seconds starting at 2T. B1 Tank fills at constant rate B2/2T until fullness B2 is reached. (Slope = flow rate) Shuttered Bottom 0 T 2T 3T 4T 5T 6T time Volume of water extracted instantaneously MPEG Analogs: Tank = Video Buffer Verifier (Hypothetical Decoder Buffer) B = VBV Buffer Size (in Bits) T = Output Frame Period Constant Flow = Constant Input Bit Rate = B2/2T bits/sec Extracted Volume = Coded Bits in Each Picture (B2-B1) 2T = VBV Delay for Each Picture NOTE: In general, coded bits per picture varies greatly! 114

MPEG's Video Buffer Verifier Water Tank Analogy (Overflow Condition) Constant Flow B Tank Fullness Volume of water (B2-B1)/2 is extracted instantaneously every T seconds starting at 2T. Overflow! B2 B1 Tank fills at constant rate B2/2T. Shuttered Bottom 0 T 2T 3T 4T 5T 6T time Volume of water extracted instantaneously 115

MPEG's Video Buffer Verifier Water Tank Analogy (Underflow Condition) B Tank Fullness Constant Flow B2 Volume of water 3*(B2-B1)/2 is extracted instantaneously every T seconds starting at 2T. B1 Tank fills at constant rate B2/2T. Shuttered Bottom 0 T 2T 3T 4T 5T 6T time Underflow! Volume of water extracted instantaneously 116

VBV Buffer Size and VBV Delay B B = vbv_buffer_size (bits) b(1) b(3) b(2) Slope = R NOTE: Slopes are all equal in Constant Bit Rate operation! All bits for Picture 1 Slope = R Slope = R Slope = R All bits for Picture 4 -T/2 0 T 2T 3T 4T 5T 6T 7T 8T time vbv_delay(1) vbv_delay(2) vbv_delay(3) vbv_delay(n) tells decoder how long to wait before extracting bits for n th picture, assuming initially empty buffer. vbv_delay(n) = 90,000*b(n)/R, where R = bit rate in bits/sec. Note that vbv_delay(n) is therefore proportional to fullness. Sequence Header GOP Header Picture Header Coded Bits for Pict 1 Picture Header Coded Bits for Pict 2 Picture Header Coded Bits for Pict 3 Picture Header Coded Bits for Pict 4 vbv_buffer_size (in units of 16*1024 bits) vbv_delay(1) (in units of 90kHz clocks) vbv_delay(2) (in units of 90kHz clocks) vbv_delay(3) (in units of 90kHz clocks) 117

CBR vs. VBR: VBV Models CBR: VBV fills at actual bit rate VBV Fullness Slope = R act Time VBR: VBV fills at max bit rate until full, then waits VBV Fullness Slope = R max Time 118

Profiles and Levels Problem: Solution: Profile: Level: A Decoder that could decode any MPEG-2 bitstream would be prohibitive in terms of memory and performance. Decoder manufacturers might choose proprietary subsets of the syntax, preventing interoperability. Pre-defined subsets of the syntax: Profiles & Levels create compliance points A defined subset of syntax elements in MPEG-2 (e.g, 4:2:0 only, I/P frames only, field DCT, etc.) Parameter constraints on those syntax elements (e.g., max Picture Size, max Bit Rate, max Vertical Motion Vector, max Buffer Size, etc.) 119

Profiles and Levels Profiles: Simple, Main, SNR, Spatial, High, 4:2:2 Levels: Low, Main, High-1440, High Not all Profile/Level combinations are allowed. Main Profile: - B Frames supported (not so in Simple Profile) - 4:2:2 and 4:4:4 not supported - Scalable Modes not supported - Restricted slice structure Main Level: - max Picture size: 720x576, 30 frames/sec - max Bitrate: 15 Mbps - max Buffer size: 1.835008 Mbits A Compliance Point is a Profile at a Level, - e.g., Main Profile at Main Level, MP@ML 120

Profiles and Levels Level Simple Main SNR Profile Spatial High 4:2:2 High ATSC Formats 1920H 1152V 60Hz 1920H 1152V 60Hz 960H 576V 30Hz SMPTE 308M High- 1440 1440H 1152V 60Hz 1440H 1152V 60Hz 720H 576V 30Hz 1440H 1152V 60Hz 720H 576V 30Hz Main 720H 576V 30Hz 720H 576V 30Hz 720H 576V 30Hz 720H 576V 30Hz 352H 288V 30Hz 720H 512V/608V 30Hz Low 352H 288V 30Hz 352H 288V 30Hz Key: Max H Size Max V Size Max Frame Rate Notes: 1) A split box shows constraints on Enhancement Layer (left) and Base Layer (right) 2) In general, a compliant decoder must also handle all lower Profile and Level compliance points. 121

Statistical Multiplexing (Stat Mux) Stat mux exploits the fact that the coding complexities of a selection of video sources, at any given time, are usually quite different. For a large group of video sources, there might be only one or two difficult scenes at any given time. Stat mux uses variable bit rate (VBR) encoding to give more bits to the more difficult scenes. 122

Typical Stat Mux Encoder Video 1 Video 2 Video 3 Encoder 1 Encoder 2 Encoder 3 VBR Bitstream 1 VBR Bitstream 2 VBR Bitstream 3 Mux CBR Bitstream Multi-Program Multiplex Stat Mux Controller The bit rates of the individual encoders are adjusted so that the total bit rate is constant. Depending on the algorithm, the individual bit rates can be adjusted at, for instance, a picture or GOP level. 123

Bit Rate and Buffer Issues The bit rates and buffer sizes in a stat mux system cannot be arbitrarily chosen. To prevent buffer underflow or overflow, it is sufficient that the following relationship hold: r D size = max r E size min where D size = decoder buffer size E size = encoder buffer size r max = maximum instantaneous bit rate r min = minimum instantaneous bit rate 124

Why Use Stat Mux? Stat Mux can increase the number of coded programs in a fixed bandwidth, without decreasing the quality of any program. Broadcasters love this, since it means squeezing even more programs into a channel or transponder! Stat Mux R&D is still in its infancy, and algorithms are highly proprietary. Existing Stat Mux products achieve this goal with varying degrees of success. 125

Practicing the Art of MPEG 126

MPEG Artifacts: What to look for Blocky Artifacts seen when the eye tracks a fast-moving, detailed object may also be seen during dissolves and fades blocky grid remains fixed while the object moves under it caused by poor motion estimation and/or insufficient allocation of bits Mosquito Noise may be seen at the edges of text, logos and other sharply defined objects the edge causes high freqency DCT terms, which are coarsely quantized and spread spatially when transformed back into the pixel domain 127