Multimedia Communication Systems 1 MULTIMEDIA SIGNAL CODING AND TRANSMISSION DR. AFSHIN EBRAHIMI

Similar documents
4. Video and Animation. Contents. 4.3 Computer-based Animation. 4.1 Basic Concepts. 4.2 Television. Enhanced Definition Systems

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Motion Video Compression

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

An Overview of Video Coding Algorithms

Video 1 Video October 16, 2001

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

Advanced Computer Networks

MULTIMEDIA TECHNOLOGIES

Video coding standards

Chapter 10 Basic Video Compression Techniques

MPEG-2. ISO/IEC (or ITU-T H.262)

Multimedia Communications. Video compression

Multimedia Communications. Image and Video compression

Chapter 3 Fundamental Concepts in Video. 3.1 Types of Video Signals 3.2 Analog Video 3.3 Digital Video

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Multimedia Systems Video I (Basics of Analog and Digital Video) Mahdi Amiri April 2011 Sharif University of Technology

Part1 박찬솔. Audio overview Video overview Video encoding 2/47

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Lecture 2 Video Formation and Representation

Chapter 2 Introduction to

Digital Image Processing

Lecture 23: Digital Video. The Digital World of Multimedia Guest lecture: Jayson Bowen

Digital Media. Daniel Fuller ITEC 2110

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

To discuss. Types of video signals Analog Video Digital Video. Multimedia Computing (CSIT 410) 2

Understanding IP Video for

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems

Digital Video Telemetry System

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Transitioning from NTSC (analog) to HD Digital Video

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Principles of Video Compression

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

Multimedia. Course Code (Fall 2017) Fundamental Concepts in Video

AUDIOVISUAL COMMUNICATION

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Television History. Date / Place E. Nemer - 1

Digital Television Fundamentals

AT65 MULTIMEDIA SYSTEMS DEC 2015

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Module 3: Video Sampling Lecture 16: Sampling of video in two dimensions: Progressive vs Interlaced scans. The Lecture Contains:

Implementation of MPEG-2 Trick Modes

1. Broadcast television

Content storage architectures

Video coding. Summary. Visual perception. Hints on video coding. Pag. 1

OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY

Information Transmission Chapter 3, image and video

Tutorial on the Grand Alliance HDTV System

RECOMMENDATION ITU-R BT.1203 *

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

5.1 Types of Video Signals. Chapter 5 Fundamental Concepts in Video. Component video

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Understanding Compression Technologies for HD and Megapixel Surveillance

Overview: Video Coding Standards

Chapter 6 & Chapter 7 Digital Video CS3570

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator

EECS150 - Digital Design Lecture 12 Project Description, Part 2

Communication Theory and Engineering

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

10 Digital TV Introduction Subsampling

MPEG has been established as an international standard

Improvement of MPEG-2 Compression by Position-Dependent Encoding

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Introduction to image compression

The Development of a Synthetic Colour Test Image for Subjective and Objective Quality Assessment of Digital Codecs

Module 1: Digital Video Signal Processing Lecture 5: Color coordinates and chromonance subsampling. The Lecture Contains:

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

Essence of Image and Video

Lecture 1: Introduction & Image and Video Coding Techniques (I)

Technical Bulletin 625 Line PAL Spec v Digital Page 1 of 5

The H.263+ Video Coding Standard: Complexity and Performance

Rec. ITU-R BT RECOMMENDATION ITU-R BT PARAMETER VALUES FOR THE HDTV STANDARDS FOR PRODUCTION AND INTERNATIONAL PROGRAMME EXCHANGE

Digital television The DVB transport stream

HDTV compression for storage and transmission over Internet

ITU-T Video Coding Standards

Video Compression Basics. Nimrod Peleg Update: Dec. 2003

COPYRIGHTED MATERIAL. Introduction to Analog and Digital Television. Chapter INTRODUCTION 1.2. ANALOG TELEVISION

Chrominance Subsampling in Digital Images

1 Overview of MPEG-2 multi-view profile (MVP)

AN MPEG-4 BASED HIGH DEFINITION VTR

MPEG-1 and MPEG-2 Digital Video Coding Standards

Digital Representation

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

HEVC: Future Video Encoding Landscape

Analog and Digital Video Basics

Presented by: Amany Mohamed Yara Naguib May Mohamed Sara Mahmoud Maha Ali. Supervised by: Dr.Mohamed Abd El Ghany

VIDEO 101: INTRODUCTION:

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Audiovisual Archiving Terminology

Transcription:

1 Multimedia Communication Systems 1 MULTIMEDIA SIGNAL CODING AND TRANSMISSION DR. AFSHIN EBRAHIMI

Basics: Video and Animation 2 Video and Animation Basic concepts Television standards MPEG Digital Video Broadcasting Computer-based animation

Video Signal Representation Video signal representation includes: Visual representation Transmission Digitization Several important measures for the visual representation: 1. Vertical Detail and Viewing Distance Smallest detail that can be reproduced is a pixel Ideally: One pixel for every detail of a scene In practice: Some details fall between scanning lines Kell factor: only 70% of the vertical details are represented, due to the fact that some of the details of the scene fall between the scanning lines (determined by experience, measurements) 3

Visual Representation The Kell factor of 0.7 is independent of the way of scanning, i.e. whether the scanning is progressive (sequential scanning of lines) or the scanning is interlaced (alternate scanning, line 1, line 3,... line n-1, line 2, line 4,...) 4 Scan lines A detail where only 2 out of 3 components can be represented

Visual Representation 2. Horizontal Detail and Picture Width Picture width for conventional television service is 4:3 picture height 5 Geometry of the television image is based on the aspect ratio, which is the ratio of the picture width W to the height H (W:H). The conventional aspect ratio for television is 4:3 = 1.33 Modern systems use 16:9 = 1.77 The viewing distance D determines the angle α subtended by the picture height H. This angle is measured by the ratio of the picture height to viewing distance: tan(α) = H/D.

Visual Representation 6 3. Total Detail Content of the Image Total number of picture elements = number of vertical elements number of horizontal elements = vertical resolution aspect ratio = 525 4/3 (for NTSC) 4. Perception of Depth (3D impression) In natural vision: angular separation of the images received by the two eyes Television image: perspective appearance of objects, choice of focal length of camera lens, changes in depth of camera focus

Visual Representation 5. Luminance and Chrominance Usually not RGB, but YUV (or a variant) is used 7 6. Temporal Aspects of Illumination Motion is represented by a rapid succession of slightly different still pictures (frames). A discrete sequence of pictures is perceived as a continuous sequence of picture (due to Lucky weakness of the human brain). Between frames, the light is cut off briefly. For a realistic presentation, two conditions are required: Repetition rate must be high enough to guarantee smooth motion The persistence of vision must extend over the interval between flashes 7. Continuity of Motion To perceive continuous motion, the frame rate must be higher than 15 frames/sec. For smooth motion the frame rate should be 24-30 frames/sec.

Visual Representation 8 8. Flickering Through slow motion a periodic fluctuation of brightness, a flicker effect, arises. How to avoid this disturbing effect? A first trick: display each picture several times E.g.: 16 pictures per second: very inconvenient flicker effect display every picture 3 times, i.e. with a refresh rate of 3 16 = 48 Hz To avoid flicker, a refresh rate of at least 50 Hz is needed Computer displays achieve 70 Hz of refresh rate by the use of a refresh buffer TV picture is divided into two half-pictures by line interleaving Refresh rate of 25 Hz (PAL) for the full TV picture requires a scan rate of 2 25 Hz = 50 Hz.

Visual Representation 9. Temporal Aspect of Video Bandwidth The eye requires a video frame to be scanned every 1/25 second Scan rate and resolution determines the video bandwidth needed for transmission During one cycle of video frequency (i.e. 1 Hz) at most two horizontally adjacent pixels can be scanned Vertical resolution and frame rate relate to horizontal (line) scan frequency: vertical lines (b) frame rate (c) = horizontal scan frequency Horizontal resolution and scan frequency relate to video bandwidth: horizontal lines (a) scan frequency / 2 = video bandwidth video bandwidth = a b c/2 (since 2 horizontally adjacent pixels can be represented simultaneously during one cycle of video frequency) 9 A computer system with a resolution of a = 1312 and b = 800 pixels out of which 1024 786 are visible and a frame rate c = 100 Hz needs: a horizontal scan frequency of 800 100 Hz = 80 khz a video bandwidth of 1312 80 khz / 2 = 52.48 MHz

Digitization 10 For image processing or transmission, the analog picture or video must be converted to a digital representation. Digitization consists of: Sampling, where the color/grey level in the picture is measured at a M N array of pixels. Quantizing, where the values in a continuous range are divided into k intervals. For a satisfiable reconstruction of a picture from quantized samples 100 or more quantizing levels might be needed Very often 256 levels are used, which are representable within 8 Bits A digital picture consists of an array of integer values representing pixels (as in a simple bitmap image format)

Video Controller Standards A B C = A B 11

Television - NTSC 12 Video format standard for conventional television systems in the USA (since 1954): NTSC (National Television Systems Committee) Picture size: 525 rows, aspect ratio 4:3, refresh rate of 30 frames/sec Uses a YIQ signal (in principle nothing but a slight variation of YUV scheme) Y = 0.30. R + 0.59. G + 0.11. B I = 0.60. R - 0.28. G + 0.32. B Q = 0.21. R + 0.52. G + 0.31. B Composite signal for transmission of the signal to receivers: Individual components (YIQ) are composed into one signal Basic information consists of luminance information and chrominance difference signals Use appropriate modulation methods to eliminate interference between luminance and chrominance signals

A Typical NTSC Encoder 13

NTSC 14 Required bandwidth to transmit NTSC signals is 4.2 MHz, 6 MHz including sound The luminance (Y) or monochrome signal uses 3.58 MHz of bandwidth The I-signal uses 1.5 MHz of bandwidth, the Q-signal 0.5 MHz The I-signal is In-phase with the 3.58 MHz carrier wave, the Q-signal is in Quadrature (90 degrees out of phase) with the 3.58 MHz carrier wave.

Television - PAL 15 PAL (Phase Alternating Line, invented by W. Bruch/Telefunken 1963) Frame rate of 25 Hz, delay between frames: 1000ms / 25 frames per sec. = 40ms 625 lines, aspect ratio 4:3 Quadrature amplitude modulation similar to NTSC Bandwidth: 5.5 MHz Phase of the R-Y (V) signal is reversed by 180 degrees from line to line, to reduce color errors that occur from amplitude and phase distortion during transmission. The chrominance signal C for PAL transmission can be represented as:

Television Standards 16

Television Standards 17 -i interlaced, -p progressive (non-interlaced) More modern Television Standards: SDTV (Standard Definition TV): low resolution, aspect ratio not specified EDTV (Enhanced Definition TV): minimum of 480 lines, aspect ratio not specified HDTV (High Definition TV): minimum of 720 lines, aspect ratio of 16:9

Enhanced Definition TV (EDTV) EDTV systems are conventional systems which offer improved vertical and/or horizontal resolution by some tricks Comb filters improve horizontal resolution by more than 30% according to literature Separate black and white from color information to eliminate rainbow effects while extending resolution Progressive (non-interlaced) scanning improves vertical resolution Insertion of blank lines in between active lines, which are filled with information from: above line, below line, same line in previous picture 18 Other EDTV developments is: IDTV (Improved Definition Television) Intermediate level between NTSC and HDTV (High-Definition Television) in the U.S. Improve NTSC image by using digital memory to double scanning lines from 525 to 1050. One 1050-line image is displayed in 1/60 sec (60 frames/sec). Digital separation of chrominance and luminance signals prevents cross-interference

High Definition Systems (HDTV) HDTV is characterized by: Higher Resolution, approx. twice as many horizontal and vertical pixels as conventional systems (1024 768, up to 1920 1080) 24-bit pixels Bandwidth 5-8 times larger than for NTSC/PAL Aspect Ratio: 16/9 = 1.777 Preferred Viewing Distance: between 2.4 and 3.3 meters Digital Coding is essential in the design and implementation of HDTV: Composite Coding (sampling of the composite analog video signal i.e. all signal components are converted together into a digital form) is the straightforward and easiest alternative, but: cross-talk between luminance and chrominance in the composite signal composite coding depends on the television standard sampling frequency cannot be adopted to the bandwidth requirements of components sampling frequency is not coupled with color carrier frequency 19

High Definition Systems (HDTV) 20 Alternative: Component Coding (separate digitization of various image components): the more important luminance signal is sampled with 13.5 MHz, the chrominance signals (R-Y, B-Y) are sampled with 6.75 MHz. Global bandwidth: 13 Mhz + 6.75 MHz > 19 MHz Luminance and chrominance signals are quantized uniformly with 8 Bits. Due to high data rates (1050 lines 600 pixels/line 30 frames/sec) bandwidth is approx. 19 MHz and therefore substandards (systems which need a lower data rate) for transmission have been defined.

High Definition Systems (HDTV) Worldwide, 3 different HDTV systems are being developed: United States Full-digital solution with 1050 lines (960 visible) and a scan rate of 59.94 Hz Compatible with NTSC through IDTV Europe HD-MAC (High Definition Multiplexed Analog Components) 1250 lines (1000 visible), scan rate of 50 Hz Halving of lines (625 of 1250) and of full-picture motion allows simple conversion to PAL HD-MAC receiver uses digital image storage to show full resolution and motion Japan MUSE is a modification of the first NHK (Japan Broadcasting Company) HDTV Standard MUSE is a Direct-Broadcast-from-Satellite (DBS) System, where the 20MHz bandwidth is reduced by compression to the 8.15 MHz available on the satellite channel Full detail of the 1125 line image is retained for stationary scenes, with motion the definition is reduced by approx. 50% 21

High Definition Systems (HDTV) 22 -i interlaced, -p progressive (non-interlaced)

Television - Transmission (Substandards) HDTV data rates for transmission total picture elements = horizontal resolution vertical resolution USA: 720,000 pixels 24 Bits/pixel 60 frames/sec. = 1036.8 MBits/sec! Europe: 870,000 pixel 24 Bits/pixel 50 frames/sec = 1044 MBits/sec! HDTV with 1920 1080 pixel, 24 Bits/pixel, 30 frames/sec: 1.5 GB/sec! Reduction of data rate is unavoidable, since required rates do not fit to standard capacities provided by broadband networks (e.g. 155 or 34 MBits/sec) Different substandards for data reduction are defined: 23

Television: Transmission Further reduction of data rates is required for picture transmission Sampling gaps are left out (only visible areas are coded): Luminance has 648 sample values per line, but only 540 of them are visible. Chrominance has 216 sample values per line, but only 180 are visible. 575 visible lines: (540 + 180 + 180) samples/line 575 lines/frame = 517,500 samples/frame 517,500 samples/frame 8 Bits/sample 25 frames/sec = 103.5 MBits/sec Reduction of vertical chrominance resolution: Only the chrominance signals of each second line are transmitted. 575 visible lines: (540 + 90 + 90) samples/line 575 lines/frame = 414,000 samples/frame 414,000 samples/frame 8 Bits/sample 25 frames/sec = 82.8 MBits/sec Different source coding: Using an intra-frame working ADPCM with 3 instead of 8 Bits/sample 414,000 samples/frame 3 Bits/sample 25 frames/sec = 31.05 MBits/sec 24

Compression Techniques 25 Still very high data rate: Video should have about 1.5 Mbits/sec to fit within CD technology Audio should have about 64 192 kbit/channel need for further compression techniques, e.g. MPEG for video and audio

Classification of Applications Dialogue Mode Applications Interaction among human users via multimedia information Requirements for compression and decompression: End-to-end delay lower than 150 ms End-to-end delay of 50 ms for face-to-face dialogue applications 26 Retrieval Mode Applications A human user retrieves information from a multimedia database Requirements: Fast forward and backward data retrieval with simultaneous display Fast search for information in multimedia databases Random access to single images and audio frames with an access time less than 0.5 second Decompression should be possible without a link to other data units in order to allow random access and editing

Dialogue and Retrieval Mode Requirements for both dialogue and retrieval mode: Supporting scalable video in different systems Format must be independent of frame size and video frame rate Support of various audio and video data rates This will lead to different quality, thus data rates should be adjustable Synchronization of audio and video data Lip synchronization Economy (i.e. reasonably cheap solutions): Software realization: cheap, but low speed and low quality Hardware realization (VLSI chips): more expensive (at first), but high quality Compatibility It should be possible to generate multimedia data on one system and to reproduce the data on another system Programs available on CD can be read on different systems 27

Encoding Mechanisms for Video Basic encoding techniques like used in JPEG 28 Differential encoding for video 1. For newscast, video telephone applications, and soap operas Background often remains the same for a long time Very small difference between subsequent images Run-length coding can be used 2. Motion compensation Blocks of NxN pixels are compared in subsequent images Useful for objects moving in one direction, e.g. from left to right Other basic compression techniques: Color Look-Up Tables (CLUT) for data reduction in video streams Often used in distributed multimedia systems Silence suppression for audio Data are only encoded if the volume level exceeds a certain threshold Can be interpreted as a special case of run-length encoding

MPEG Growing need for a common format for representing compressed video and audio for data rates up to 1.5 Mbit/sec (typical rate of CD-ROM transfer: 1.2 Mbit/sec) Moving Pictures Expert Group (MPEG) Generic Approach can be used widely Maximum data rate for video in MPEG is very high: 1,856,000 bit/sec Data rates for audio between 32 and 448 Kbit/sec Video and audio compression of acceptable quality Suitable for symmetric as well as asymmetric compression: Asymmetric compression: more effort for coding (once) than for decoding (often) Symmetric compression: equal effort for compression and decompression, restricted end-to-end delay (e.g. interactive dialogue applications) 29

MPEG Today MPEG Coding for VCD (Video CD) quality Data rate of 0.9-2 Mbit/sec MPEG-2 Super-set of MPEG-1: rates up to 8 Mbit/sec Can do HDTV MPEG-4 Coding of objects, not frames Lower bandwidth (Multimedia for the web and mobility) MPEG-7 Allows multimedia content description (ease of searching) MPEG-21 Content identification and management MP3 For coding audio only MPEG Layer-3 30

The MPEG Family 31

First: MPEG Exact image format of MPEG is defined in the image preparation phase (which is similar to JPEG) Video is seen as a sequence of images (video frames) Each image consists of 3 components (YUV format, called Y, CB, CR) Luminance component has twice as many samples in horizontal and vertical axes than other two components (chrominance): 4:2:2 scheme Resolution of luminance component maximal 768 576 pixels (8 bit per pixel) Data stream includes further information, e. g.: Aspect ratio of a pixel (14 different image aspect ratios per pixel provided), e.g. 1:1 (square pixel), 16:9 (European and US HDTV) etc. Image refresh frequency (number of images per second; 8 frequencies between 23.976 Hz and 60 Hz defined, among them the European standard of 25 Hz) The encoding basically is as in JPEG: DCT, quantization, entropy encoding, but with considering several images 32

Compression Steps 33 Subsampling of chrominance information - human visual system less sensitive to chrominance than to luminance information only 1 chrominance pixel for each 2 2 neighborhood of luminance pixels Image preparation form blocks of 8 8 pixels and group them to macro blocks Frequency transformation - discrete cosine transform converts an 8 8 block of pixel values to an 8 8 matrix of horizontal and vertical spatial frequency coefficients most of the energy concentrated in low frequency coefficients, esp. in DC coefficient Quantization for suppressing high frequencies Variable length coding - assigning codewords to values to be encoded Additionally to techniques like in JPEG, the following is used before performing DCT: Predictive coding code a frame as a prediction based on the previous / the following frame Motion compensation - prediction of the values of a pixel block by relocating the block from a known picture

Macro Blocks Still images - temporal prediction yields considerable compression ratio Moving images - non-translational moving patterns (e.g. rotations, waves,...) require storage of large amount of information due to irregular motion patterns Therefore: predictive coding makes sense only for parts of the image ***** division of each image into areas called macro blocks ***** Macro blocks turn out to be suitable for compression based on motion estimation Partition of macro blocks into 4 blocks for luminance and one block for each chrominance component; each block consists of 8 8 pixels. Size of macro block is a compromise between (storage) cost for prediction and resulting compression 34 luminance Y chrominance

Motion Compensation Prediction Motion Compensation Prediction is made between successive frames Idea: coding a frame as prediction to the previous frame is useless for fast changing sequences Thus: consider moving objects by search for the new position of a macro block from the previous frame Code the prediction to the macro block of the previous frame together with the motion vector 35

Motion Compensation Prediction 36 How to find the best fitting position in the new image? Search only a given window around the old position, not the whole image! Consider only the average of all pixel values, not detailed values! Set a fault threshold: stop searching if a found macro block fits good enough Search Pattern can be a spiral: this procedure in general does not give the best result, but is fast

Motion Compensation Prediction Alternative search procedure: Store motion vector for a macro block of the previous image Move the search window for the next search by the old vector Start searching the window by a coarse pattern Refine the search for the best patterns found 37 Left: lighter grey blocks are the searched blocks on coarse scale. Best matching fields are examined more detailed by also searching the 8 neighbored blocks. Right: when we would refine all blocks, such a picture would result; the lighter a block, the better the matching; the lightest block is taken for motion prediction. [Note: in reality not all blocks are refined, only the lightest ones; by this maybe the best block isn t found.]

Types of Frames 4 different types of frame coding are used in MPEG for efficient coding with fast random access I-frames: Intra-coded frames (moderate compression but fast random access) P-frames: Predictive-coded frames (with motion compensation, prediction to the previous I- or P-frame) B-frames: Bi-directionally predictive-coded frames (referencing to the previous and the following I- or P-frames) D-frames: DC frames (Limited use: encodes only DC components of intraframe coding) 38 B-frame can be decoded only after the subsequent P-frame has been decoded

Types of Images 39 Prediction Time Bi-directional prediction I : P : B = 1 : 2 : 6

I-Frames I-Frames are self-contained, i.e. represent a full image Coded without reference to other images Treated as still images Use of JPEG, but compression in real-time I-Frames may serve as points of random access in MPEG streams DCT on 8 8 blocks within macro blocks + DPCM coding of DC coefficients Typically, an I-frame may occur 3 times per second to give reasonably fast random access Typical data allocation: I-frames allocate up to 3 times as many bits as P-frames P-frames allocate 2 5 times as many bits as B-frames In case of li le mo on in the video, a greater propor on of the bits should be assigned to I-frames, since P- and B-frames only need very low number of bits 40

P-Frames and B-Frames P-Frame Requires information about previous I-frame and/or previous P-frame Motion estimation is done for the macro blocks of the coded frame: The motion vector (difference between locations of the macro blocks) is specified The (typically small) difference in content of the two macro blocks is computed and DCT/entropy encoded That means: P-frames consist of I-frame macro blocks (if no prediction is possible) and predictive macro blocks B-Frame Requires information about previous and following I- and/or P-frame B-frame = difference of prediction of past image and following P-/I-frame Quantization and entropy encoding of the macro blocks will be very efficient on such double-predicted frames Highest compression rate will be obtained Decoding only is possible after receiving the following I- resp. P-frame 41

Group of Pictures (GOP) 42 MPEG gives no instruction in which order to code the different frame types, but can be specified by a user parameter. But: each stream of MPEG frames shows a fixed pattern, the Group of Pictures: It typically starts with an I-frame It typically ends with frame right before the next I-frame Open GOP ends in B-frame, Closed in P-frame Very flexible: GOPs could be independently decoded, but they also could reference to the next GOP Typical patterns: I B B P B B P B B I I B B P B B P B B P B B I Why not have all P and B frames? It is clear that with the loss of one frame, a new full-image (I-frame) is needed to allow the receiver to recover from the information loss

D-Frames 43 Intraframe encoded, but only lowest frequencies of an image (DC coefficients) are encoded Used (only) for fast forward or fast rewind mode Could also be realized by suitable order of I-frames Slow rewind playback requires a lot of storage capacity: Thus all images in a "group of pictures" (GOP) are decoded in the forward mode and stored, after that rewind playback is possible

Coding Process 44

Layers of MPEG Data Streams 1. Sequence Layer Sequence header + one or more groups of picture Header contains parameters like picture size, data rate, aspect ratio, DCT quantization matrices 2. Group of Pictures Layer Contains at least one I-frame for random access Additionally timing info and user data 3. Picture Layer I-, P-, B- or D-frame, with synchronization info, resolution, range of motion vectors 4. Slice Layer Subdivision of a picture providing certain immunity to data corruption 5. Macro block Layer Basic unit for motion compensation and quantizer scale changes 6. Block Layer Basic coding unit (8 8 pixels): DCT applied at this block level 45

Hirarchical Structure of Data Sequence Layer 46 GOP Layer Picture Layer Slice Layer Macroblock Layer Block Layer

Audio Encoding Audio Encoding within MPEG Picture encoding principles can be modified for use in audio as well Sampling rates of 32, 44.1 and 48 khz Transformation into frequency domain by Fast Fourier Transform (FFT) similar to the technique which is used for video Audio spectrum is split into 32 non-interleaved subbands (for each subband, the audio amplitude is calculated); noise level determination by psychoacoustical model Psychoacustical model means to consider the human brain, e.g. recognizing only a single tone if two similar tones are played very close together, or not perceiving the quieter tone if two tones with highly different loudness are played simultaneously Each subband has its own quantization granularity Higher noise level: rough quantization (and vice versa) Single channel, two independent channels or Stereo are possible (in the case of stereo : redundancy between the two signals is used for higher compression ratio) 47

Audio Encoding 48

Audio Encoding 49 3 different layers of encoder and decoder complexity are used Quantized spectral portions of layers 1 and 2 are PCM encoded Quantized spectral portions of layer 3 are Huffman encoded MPEG layer 3 is known as mp3 14 fixed bit rates for encoded audio data stream on each layer minimal rate: 32 Kbit/sec for each layer maximal rate: 448 Kbit/sec (layer 1), 384 Kbit/sec (layer 2), 320 Kbit/sec (layer3) Variable bit rate support is possible only on layer 3

Audio Data Stream 50 General Background on the MPEG Audio Data Stream MPEG specifies syntax for interleaved audio and video streams e.g. synchronization information Audio data stream consists of frames, divided into audio access units composed of slots Slots consist of 4 bytes (layer 1, lowest complexity) or 1 byte (other layers) A frame always consists of a fixed number of samples Audio access unit: smallest possible audio sequence of compressed data to be decoded independently of all other data Audio access units of one frame lead to playing time between 8 msec (48 Hz) and 12 msec (32 Hz)

MPEG-2 Why another MPEG standard? Higher data rate as MPEG-1, but compatible extension of MPEG-1 Target rate of 40 Mbit/sec Higher resolution as needed in HDTV Support a larger number of applications - definition of MPEG-2 in terms of extensible profiles and levels for each important application class, e.g. Main Profile: for digital video transmission (2 to 80 Mbit/sec) over cable, satellite and other broadcast channels, digital storage, HDTV etc. High Profile: HDTV Scalable Profile: compatible with terrestrial TV/HDTV, packet-network video systems, backward compatibility with MPEG-1 and other standards, e.g. H.261 The encoding standard should be a toolkit rather than a flat procedure Interlaced and non-interlaced frame Different color subsampling modes e.g., 4:2:2, 4:2:0 Flexible quantization schemes can be changed at picture level Scalable bit-streams 51

MPEG-2 - Profiles and Levels 52

Scalable Profiles 53 A signal is composed of several streams (layers): Base (Lower) layer is a fully decodable image Enhancement (Upper) layer gives additional information Better resolution Higher frame rate Better quality Corresponds to JPEG hierarchical mode

Scalable Profiles 54 Scaling can be done on different parameters: Spatial scaling: Frames are given in different resolutions Base layer frames are used in any case Upper layer frames are stored as prediction from base layer frames Single data stream can include different image formats (CIF, CCIR 601, HDTV...) SNR scaling: The error on the lower layer given by quantization is encoded and sent on the upper layer

MPEG-2: Effects of Interlacing 55 Prediction Modes and Motion Compensation Frame prediction: current frame predicted from previous frame Dual prime motion compensation: Top field of current frame is predicted from two motion vectors coming from the top and bottom field of reference frame Bottom field of previous frame and top field of current frame predicts the bottom field of current frame 16 8 motion compensation mode A macroblock may have two of them A B-image macro block may have four

MPEG-2 Audio Standard 56 Low bit rate coding of multi-channel audio Up to five full bandwidth channels (left, right, center, 2 surround) plus additional low frequency enhancement channel and/or up to 7 commentary/multilingual channels Extension of MPEG-1 stereo and mono coding to half sampling rates (16 24 khz) improving quality for bit rates at 64 kbits/sec per channel MPEG-2 Audio Multi-channel Coding Standard: Backward compatibility with existing MPEG-1 Audio Standard Organizes formal testing of proposed MPEG-2 multi-channel audio codecs and nonbackward- compatible codecs (rates 256 448 Kbits/sec)

MPEG-2 Streams MPEG-2 system defines how to combine audio, video and other data into single or multiple streams suitable for storage and transmission syntactical and semantically rules for synchronizing the decoding and presentation of video and audio information and avoiding buffer over- or underflow Streams include timestamps for decoding, presentation and delivery Basic multiplexing step: each stream is added system-level information and packetized Packetized Elementary Stream (PES) PESs combined to Program or Transport Stream (supporting large number of applications): Program Stream: similar to MPEG-1 stream error-free environment, variable packet lengths, constant end-to-end delay Transport Stream: combines PESs and independent time bases into single stream use in lossy or noisy media, packet length 188 bytes including header suited for digital TV and videophony over fiber, satellite, cable, ISDN, ATM Conversion between Program and Transport Stream possible (and sometimes reasonable) 57

MPEG-4 Originally a MPEG-3 standard for HDTV was planned But MPEG-2 scaling was sufficient; development of MPEG-3 was cancelled MPEG-4 initiative started in September 1993 Very low bit rate coding of audio-visual programs February 1997: description of requirements for the MPEG-4 standard approved Idea: development of fundamentally new algorithmic techniques New sorts of interactivity (dynamic instead of static objects) Integration of natural and synthetic audio and video material Simultaneous use of material coming from different sources Model-based image coding of human interaction with multimedia environments Low-bit rate speech coding e.g. for use in GSM Basic elements: Coding tools for audio-visual objects: efficient compression, support of object based interactivity, scalability and error robustness Formal methods for syntactic description of coded audio-visual objects 58

Core Idea of MPEG-4 Object based Representation Representation of the video scene is understood as a composition of video objects with respect to their spatial and temporal relationship (same with audio!) Individual objects in a scene can be coded with different parameters, at different quality levels and with different coding algorithms 59

MPEG-4: Objects and Scenes 60 A/V object A video object within a scene The background An instrument or voice Coded independently A/V scene Mixture of objects Individual bitstreams are multiplexed and transmitted One or more channels Each channel may have its own quality of service Synchronization information

Objects of a Scene 61 Scene Graph Graph without cycles Embeds objects in a coordinate system (including synchronization information) MPEG-4 provides a language for describing objects (oriented at VRML Virtual Reality Modeling Language) Usable for as well video as animated objects

An Example MPEG-4 Scene 62

MPEG-4 Stream Composition and Delivery 63

Linking Streams into the Scene 64

MPEG-7 Objectives A flexible, extensible, and multi-level standard framework for describing (not coding!) multimedia and synchronize between content and descriptions Enable fast and efficient content searching, filtering and identification Define low-level features, structure, semantic, models, collections, creation, etc. Goal: To search, identify, filter and browse audiovisual content Description of contents Descriptors Describe basic characteristics of audiovisual content Examples: Shape, Color, Texture, Description Schemes Describe combinations of descriptors Example: Spoken Content 65

Simple Description 66

MPEG-21 67 MPEG 21 Solution for access to and management of digital media E.g. offering, searching, buying, Digital Rights Management,

Digital Video Broadcasting 1991 foundation of the ELG (European Launching Group) Goal: development of digital television in Europe 1993 renaming into DVB (Digital Video Broadcasting) goal: introduction of digital television based on satellite transmission (DVB-S) cable network technology (DVB-C) later also terrestrial transmission (DVB-T) 68

DVB Container DVB transmits MPEG-2 container High flexibility for the transmission of digital data No restrictions regarding the type of information DVB Service Information specifies the content of a container NIT (Network Information Table): lists the services of a provider, contains additional information for set-top boxes SDT (Service Description Table): list of names and parameters for each service within a MPEG multiplex channel EIT (Event Information Table): status information about the current transmission, additional information for set-top boxes TDT (Time and Date Table): Update information for set-top boxes 69

DVB Worldwide 70

Computer-based Animation To animate = to bring to life Animation covers changes in: time-varying positions (motion dynamics) shape, color, transparency, structure and texture of an object (update dynamics) as well as lightning, camera position, camera orientation and focus Basic Concepts of animation are Input Process Key frames, where animated objects are at extreme or characteristic positions must be digitized from drawings Often a post-processing by a computer is required Composition stage Inbetween process Changing colors 71

Composition Stage Foreground and background figures are combined to generate an individual frame Placing of several low-resolution frames of an animation in an array leads to a trail film (pencil test), by the use of the pan-zoom feature (This feature is available for some frame buffers) The frame buffer can take a part of an image (pan) and enlarge it to full screen (zoom) Continuity is achieved by repeating the pan-zoom process fast enough 72

Inbetween Process 73 Composition of intermediate frames between key frames Performed by linear interpolation (lerping) between start- and end-positions To achieve more realistic results, cubic spline-interpolation can be used Interpolated frames Rather unrealistic motion (in most cases) Key frames

Inbetween Process 74 more realistic motion achieved by two cubic splines X X X A function s is called cubic interpolating spline to the points a = X < X <... < X = b, if 1. s is twice continuous differentiable 2. for i = 0,..., n it is a polynomial of degree 3 This line is smooth, because the polynomials have equal primary and secondary derivatives at the points X, X,..., X

Inbetween Process Calculation of successive cubic splines: 75 (x) are polynomials of degree 3 Let (x) be given Then (x) = x + x + x + is constructed as follows: (x ) = x + x + x + = f(x ) (x ) = x + x + x + = f(x ) (x ) = 3 x + 2 x + = (x ) (x ) = 6 x + 2 = (x ) 4 equations for,,,

Changing Colors 76 Two techniques are possible 1. CLUT animation Changing of the Color Look Up Table (CLUT) of the frame buffer. This changes the colors of the image. 2. New color information for each frame Frame buffer: 640 x 512 pixel 8 Bits/pixel 30 frames per sec. = 78.6 MBits/sec data rate for complete update The first technique is much faster than the second, since changing the CLUT requires the transmission of only 0.3-3 Kbytes (here 2. is more than 300 times faster than 1.)

Animation Languages Categories for Animation languages Linear-list Notations Events are described by starting and ending frame number and an action (event) 17, 31, C, ROTATE HOUSE, 1, 45 means: Between frames 17 and 31 rotate the object HOUSE around axis 1 by 45 degrees, determining the amount of rotation at each frame from table C General-purpose Languages Embed animation capability within programming languages Values of variables as parameters to the routines that perform animation e.g. ASAS, which is built on top of LISP: (grasp my-cube): cube becomes current object (cw 0.05): spin it clockwise, by a small amount Graphical Languages Describe animation in a more visual way than textual languages Express, edit and comprehend the changes in an animation Explicit descriptions of actions are replaced by a picture of the action 77

Controlling of Animation Techniques for controlling animations (independent of the language which describes the animation): Full Explicit Control Complete way of control, because all aspects are defined: Simple changes (scaling, translation, rotation) are specified or key frames and interpolation methods (either explicit or by direct manipulations by mouse, joystick, data glove) are provided Procedural Control Communication between objects to determine properties Physically-based systems: position of one object may influence motion of another (ball cannot pass a wall) Actor-based systems: actors pass their position to other actors to affect their behavior (actor A stays behind actor B) 78

Controlling of Animation Constraint-based Systems Natural way of moving from A to B is via a straight line, i.e. linearly. However, very often the motion is more complicated Movement of objects is determined by other objects, they are in contact with Compound motion may not be linear and is modeled by constraints (ball follows a pathway) Tracking Live Action Trajectories of animated objects are generated by tracking live action Rotoscoping: Film with real actors as template, designers draw over the film, change background and replace human actors with animated counterparts Attach indicators to key points of actor s body. Tracking of indicator positions provides key points in the animation model Another example: Data glove measures position and orientation of the hand flexion and extension of fingers and fingerparts From these information we can calculate actions, e.g. movements 79

Controlling of Animation Kinematics: Description using the position and velocity of objects. E.g. at time t = 0 the CUBE is at the origin. It moves with the constant acceleration of 0.5 m/ for 2 sec. in the direction of (1,1,4) (0, 0, 0) (1, 1, 4) 2 seconds b = 0.5 m/ 80 kinematical description of the motion of a cube Dynamics: Takes into consideration the physical laws that define the kinematics E.g. at time t = 0 the CUBE is in position (0 meters, 100 meters, 0 meters) and has a mass of 5 kg. The force of gravity acts on the cube (Result in this case: the ball will fall down) (0, 100, 0)

Display of Animation For the display of animations with raster systems the animated objects have to be scanconverted to their pixmap in the frame buffer. This procedure has to be done at least 10 (better: 20) times per second in order to give a reasonably smooth effect. Problem Frame rate of 20 pictures/sec. requires manipulation, scan-conversion and display of an object in only 50 msec. Scan conversion should only use a small fraction of these 50 msec since other operations (erasing, redrawing,... etc) have to be done, too 81 Solution Double-buffering: frame buffer is divided into two images, each with half of the bits of the overall frame buffer ( pipeline ). While the operation (like rotating) and scan-conversion is processed for the second half of the pixmap, the first half is displayed and vice versa. Time

Transmission of Animation 82 Symbolic Representation Graphical descriptions (circle) of an animated object (ball) + operations (roll) Animation is displayed at the receiver by scan-conversion of objects to pixmap Transmission rate depends on (transmission rate is context dependent): size of the symbolic representation structure, size of operation structure number of animated objects and of commands Pixmap Representation Longer times for data transmission than with symbolic representation, because of the large data size of pixmap Shorter display times, because no scan-conversion is necessary at receiver side Transmission rate = size of pixmap frame rate (fixed transmission rate)

Conclusions NTSC and PAL as television standards Widespread, but only belong to Enhanced Definition TV systems Needed for better quality: High Definition TV (HDTV) Problem: compression is needed for HDTV systems MPEG as standard for video and audio compression High-quality video/audio compression based on JPEG-techniques Additionally: Motion prediction between video frames Newer versions (MPEG-4) achieve further compression by considering objects Video Transmission DVB as one standard for broadcasting SDTV, EDTV, HDTV or any MPEG content to the customer Animation Technique for artificially creating videos 83