Multimedia Communication Systems 1 MULTIMEDIA SIGNAL CODING AND TRANSMISSION DR. AFSHIN EBRAHIMI

Similar documents
COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Motion Video Compression

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Multimedia Communications. Image and Video compression

Multimedia Communications. Video compression

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Video coding standards

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Digital Image Processing

Video 1 Video October 16, 2001

An Overview of Video Coding Algorithms

Digital Television Fundamentals

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

The H.26L Video Coding Project

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Lecture 2 Video Formation and Representation

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

HEVC: Future Video Encoding Landscape

Advanced Computer Networks

MPEGTool: An X Window Based MPEG Encoder and Statistics Tool 1

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Principles of Video Compression

INTERNATIONAL TELECOMMUNICATION UNION. SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Coding of moving video

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

INTRA-FRAME WAVELET VIDEO CODING

Introduction to image compression

Digital Video Telemetry System

Understanding Compression Technologies for HD and Megapixel Surveillance

Colour Reproduction Performance of JPEG and JPEG2000 Codecs

WYNER-ZIV VIDEO CODING WITH LOW ENCODER COMPLEXITY

Chapter 2 Introduction to

Content storage architectures

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

AUDIOVISUAL COMMUNICATION

Multimedia. Course Code (Fall 2017) Fundamental Concepts in Video

Impact of scan conversion methods on the performance of scalable. video coding. E. Dubois, N. Baaziz and M. Matta. INRS-Telecommunications

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

MULTIMEDIA TECHNOLOGIES

Chapter 10 Basic Video Compression Techniques

Chapter 3 Fundamental Concepts in Video. 3.1 Types of Video Signals 3.2 Analog Video 3.3 Digital Video

A look at the MPEG video coding standard for variable bit rate video transmission 1

Lecture 23: Digital Video. The Digital World of Multimedia Guest lecture: Jayson Bowen

Lecture 1: Introduction & Image and Video Coding Techniques (I)

Digital Representation

To discuss. Types of video signals Analog Video Digital Video. Multimedia Computing (CSIT 410) 2

Understanding IP Video for

Improvement of MPEG-2 Compression by Position-Dependent Encoding

The H.263+ Video Coding Standard: Complexity and Performance

Chapter 2. Advanced Telecommunications and Signal Processing Program. E. Galarza, Raynard O. Hinds, Eric C. Reed, Lon E. Sun-

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Multimedia Systems Video I (Basics of Analog and Digital Video) Mahdi Amiri April 2011 Sharif University of Technology

RECOMMENDATION ITU-R BT.1203 *

MPEG + Compression of Moving Pictures for Digital Cinema Using the MPEG-2 Toolkit. A Digital Cinema Accelerator

The Development of a Synthetic Colour Test Image for Subjective and Objective Quality Assessment of Digital Codecs

10 Digital TV Introduction Subsampling

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

A review of the implementation of HDTV technology over SDTV technology

Overview: Video Coding Standards

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

PAL uncompressed. 768x576 pixels per frame. 31 MB per second 1.85 GB per minute. x 3 bytes per pixel (24 bit colour) x 25 frames per second

COPYRIGHTED MATERIAL. Introduction to Analog and Digital Television. Chapter INTRODUCTION 1.2. ANALOG TELEVISION

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Chapter 2 Video Coding Standards and Video Formats

VIDEO 101: INTRODUCTION:

Visual Communication at Limited Colour Display Capability

A Big Umbrella. Content Creation: produce the media, compress it to a format that is portable/ deliverable

Information Transmission Chapter 3, image and video

AT65 MULTIMEDIA SYSTEMS DEC 2015

Essence of Image and Video

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

1. Broadcast television

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Audiovisual Archiving Terminology

EMBEDDED ZEROTREE WAVELET CODING WITH JOINT HUFFMAN AND ARITHMETIC CODING

MULTIMEDIA COMPRESSION AND COMMUNICATION

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

Video Over Mobile Networks

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Rec. ITU-R BT RECOMMENDATION ITU-R BT PARAMETER VALUES FOR THE HDTV STANDARDS FOR PRODUCTION AND INTERNATIONAL PROGRAMME EXCHANGE

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

5.1 Types of Video Signals. Chapter 5 Fundamental Concepts in Video. Component video

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS

About... D 3 Technology TM.

MPEG-2. ISO/IEC (or ITU-T H.262)

Multimedia Networking

Minimax Disappointment Video Broadcasting

Unequal Error Protection Codes for Wavelet Image Transmission over W-CDMA, AWGN and Rayleigh Fading Channels

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11)

Rec. ITU-R BT RECOMMENDATION ITU-R BT * WIDE-SCREEN SIGNALLING FOR BROADCASTING

Lesson 2.2: Digitizing and Packetizing Voice. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations

Will Widescreen (16:9) Work Over Cable? Ralph W. Brown

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Transcription:

1 Multimedia Communication Systems 1 MULTIMEDIA SIGNAL CODING AND TRANSMISSION DR. AFSHIN EBRAHIMI

Table of Contents 2 1 Introduction 1.1 Concepts and terminology 1.1.1 Signal representation by source coding 1.1.2 Optimization of transmission 1.2 Signal sources and acquisition 1.3 Digital representation of multimedia signals 1.3.1 Image and video signals 1.3.2 Speech and audio signals 1.3.3 Need for compression technology

Table of Contents 2 Fundamentals of Signal Processing and Statistics 2.1 Signals and systems 2.1.1 Elementary signals 2.1.2 Systems operations 2.2 Signals and Fourier spectra 2.2.1 Two- and multi-dimensional spectra 2.2.2 Spatio-temporal signals 2.3 Sampling of multimedia signals 2.3.1 Separable two-dimensional sampling 2.3.2 Sampling of video signals 2.4 Digital signal processing in multiple dimensions signals 2.5 Statistical analysis 2.5.1 Sample Statistics 2.5.2 Joint statistical properties 2.5.3 Spectral properties of random 2.5.4 Markov chain models 2.5.5 Statistical foundations of information theory 3

Table of Contents 4 2.6 Linear prediction 2.6.1 Autoregressive models 2.6.2 Linear prediction 2.7 Linear block transforms 2.7.1 Orthogonal basis functions 2.7.2 Basis functions of orthogonal transforms 2.7.3 Efficiency of transforms 2.7.4 Transforms with block overlap 2.8 Filterbank transforms 2.8.1 Properties of subband filters 2.8.2 Implementation of filterbank structures 2.8.3 Discrete wavelet transform (DWT) 2.8.4 Two- and multi-dimensional filter banks 2.8.5 Pyramid decomposition

Table of Contents 5 3 Perception and Quality 3.1 Properties of vision 3.1.1 Physiology of the eye 3.1.2 Sensitivity functions 3.1.3 Color vision 3.2 Properties of hearing 3.2.1 Physiology of the ear 3.2.2 Sensitivity functions 3.3 Quality metrics 3.3.1 Objective signal quality metrics 3.3.2 Subjective assessment

Table of Contents 6 4 Quantization and Coding 4.4.3 Systematic variable-length codes 4.1 Scalar quantization and pulse code modulation 4.4.4 Arithmetic coding 4.2 Coding theory 4.4.5 Adaptive and context-dependent entropy coding 4.2.1 Source coding theorem and rate-distortion function 4.4.6 Entropy coding and transmission errors 4.2.2 Rate-distortion function for correlated signals 4.4.7 Lempel-Ziv coding 4.2.3 Rate-distortion function for multi-dimensional 4.5 Vector quantization (VQ) signals 4.5.1 Basic principles of VQ 4.3 Rate-distortion optimization of quantizers 4.5.2 VQ with uniform codebooks 4.4 Entropy coding 4.5.3 VQ with non-uniform codebooks 4.4.1 Properties of variable-length codes 4.5.4 Structured codebooks 4.4.2 Huffman codes 4.5.5 Rate-constrained VQ

Table of Contents 7 5 Methods of Signal Compression 5.1 Binary signal coding 5.1.1 Run-length coding 5.2 Predictive coding 5.2.1 Open-loop and closed-loop prediction systems 5.2.2 Non-linear and shift-variant prediction 5.2.3 Effects of transmission losses 5.2.4 Vector prediction 5.2.5 Prediction in multi-resolution pyramids 5.3 Transform coding 5.3.1 Gain through discrete transform coding 5.3.2 Quantization of transform coefficients 5.3.3 Coding of transform coefficients 5.3.4 Transform coding under transmission losses 5.4 Bitstreams with multiple decoding capability 5.4.1 Simulcast and transcoding 5.4.2 Scalable coding 5.4.3 Multiple-description coding

Table of Contents 8 6 Still Image Coding 6.1 Compression of binary images 6.1.1 Compression of bi-level images 6.1.2 Binary shape coding 6.1.3 Contour shape coding 6.2 Vector quantization of images 6.3 Predictive coding 6.3.1 2D prediction 6.3.2 2D vector prediction 6.3.3 Quantization and encoding of prediction errors 6.3.4 Error propagation in 2D DPCM 6.4 Transform coding of images 6.4.1 Block transform coding. 6.4.2 Overlapping-block transform coding 6.4.3 Subband and wavelet transform coding 6.4.4 Local adaptation of transform bases by signal properties 6.5 Synthesis based image coding 6.5.1 Region-based coding 6.5.2 Colour and texture synthesis 6.5.3 Post filtering 6.6 Still image coding standards

Table of Contents 9 7 Video Coding 7.3.2 Motion-compensated temporal filtering. 7.1 Intraframe-only and frame replenishment coding 7.3.3 Quantization and encoding of MCTF frames 7.2 Hybrid video coding 7.4 Coding of side information (motion, modes) 7.2.1 Motion-compensated Hybrid Coders 7.5 Scalable video coding 7.2.2 Characteristics of interframe prediction error 7.5.1 Scalable hybrid coding signals 7.5.2 Scalable 3D frequency coding 7.2.3 Quantization error feedback and error 7.6 Multi-view video coding propagation 7.7 Synthesis based video coding 7.2.4 Reference pictures in motion-compensated prediction 7.7.1 Region-based video coding 7.2.5 Accuracy of motion compensation 7.7.2 Distributed source coding 7.2.6 Hybrid coding of interlaced video signals 7.7.3 Super-resolution synthesis 7.2.7 Optimization of hybrid encoders.. 7.7.4 Dynamic texture synthesis 7.3 Spatio-temporal transform coding 7.8 Video coding standards 7.3.1 Interframe transform and subband coding

Table of Contents 10 8 Speech and Audio Coding 8.1 Coding of speech signals 8.1.1 Linear predictive coding 8.1.2 Parametric (synthesis) coding 8.1.3 Speech coding standards 8.2 Audio (music and sound) coding 8.2.1 Transform coding of audio signals 8.2.2 Synthesis based coding of audio and sound signals 8.2.3 Coding of stereo and multi-channel audio signals 8.2.4 Music and sound coding standards

Table of Contents 11 Transmission and Storage of Multimedia Data 9.1 Digital multimedia services 9.2 Network interfaces 9.3 Adaptation to channel characteristics 9.3.1 Rate and transmission control 9.3.2 Error control 9.4 Definitions at systems level 9.5 Digital broadcast 9.6 Media streaming

Introduction 12 Multimedia communication systems are a flagship of the information technology revolution. The combination of multiple information types, particularly audiovisual information (speech/audio/sound/image/video/graphics) with abstracted (text), smelled or tactile information provides new degrees of freedom in exchange, distribution and acquisition of information. Communication includes exchange of information between different persons, between persons and machines, or between machines only. Sufficient perceptual quality must be provided, which is related to the compression and its interrelationship with transmission by networks. Advanced methodologies are based on content analysis and identification, which is of high importance for automatic user assistance and interactivity. In multimedia communication, concepts and methods from signal processing, systems and communications theory play a dominant role, where audiovisual signals are a primary challenge regarding transmission, storage and processing complexity.

Introduction 13 Books Steinmetz, R,; Nahrstedt, K.: Media Coding and Content Processing. Prentice Hall, 2002. Steinmetz, R.; Nahrstedt, K., Multimedia Systems, Springer Verlag, 2004. Steinmetz, R.; Nahrstedt, K., Multimedia Applications, Springer Verlag, 2004. J.R. Ohm, Multimedia Communication Technology, Springer, 2004. Magazines Multimedia Systems, ACM/Springer Multimedia Magazine, IEEE

Introduction 14 What is Multimedia? Simple definition of Multimedia: Multi - Media Any kind of system that supports more than one kind of medium Is Television Multimedia? Definition: Multimedia means the integration of continuous media (e.g., audio, video) and discrete media (e.g., text, graphics, images) through which the digital information can be conveyed to the user in an appropriate way. Multi: many, much, multiple Medium: A means to distribute and represent information

Facets of Medium 15 1. Perception Medium - How do humans perceive information in a computer environment? (by seeing, by hearing,...) 2. Representation Medium - How is the information encoded in the computer? (ASCII, PCM, MPEG,...) 3. Presentation Medium - Which medium is used to output information from the computer or to bring it into the computer? Input: keyboard, microphone, camera,... 4. Storage Medium - Where is the information stored? 5. Transmission Medium - Which kind of medium is used to transmit the information? (copper cable, radio,...) 6. Information Exchange Medium (combination of storage and transmission media) - Which information carrier will be used for information exchange between different locations?

Classification of Media 16 Each medium defines Representation values Representation space Representation values determine the information representation of different media: continuous representation values (e.g. electro-magnetic waves) discrete representation values (e.g. characters of a text in digital form) Representation space determines the technique to output the media information, usually visually (e.g., paper, slideshow) or acoustically (e.g., speakers) Spatial dimensions: Two dimensional (2D graphics) Three dimensional (holography) Temporal dimensions: Time independent (document) - discrete media (e.g. text of a book) Time dependent (movie) - continuous media (e.g. sound, video)

Data Streams 17 When transmitted or played out, continuous media need a changing set of data in terms of time, i.e. data streams. How to deal with such streams? Asynchronous Transmission Suitable for communication with no time restrictions (discrete media) E.g. electronic mail Synchronous Transmission Beginning of transmission may only take place at well-defined times A clock signal runs the synchronization between a sender and a receiver Isochronous Transmission Periodic transmissions, time separation between subsequent transmissions is a multiple of a certain unit interval A maximum and a minimum end-to-end delay for each packet of a data stream (limited jitter) is required An end-to-end network connection is isochronous if it has a guaranteed bit rate and if the jitter also is guaranteed and small

Data Stream Characteristics 18 Strongly periodic data streams Identical intervals T No jitter (optimally) Example: uncompressed audio T t Weakly periodic data streams Periodic intervals T Timing variations in the intervals Example: segmented transmission T 1 T 2 T 3 T 1 T 2 T 3 T T t Aperiodic data streams Arbitrary intervals Example: transmission of mouse control signals T 1 T 2 T 3 T 4 T 5 T 6 t

Data Stream Characteristics 19 Strongly regular data streams Quantity remains constant during the entire lifetime of the stream Typical for uncompressed video/audio T D 1 D 1 D 1 D 1 D 1 t Weakly regular data streams Quantity varies periodically Can result from some compression techniques E.g. videos coded with MPEG T D 1 D 1 D 2 D 2 D 3 D 3 t Irregular data streams Quantity is neither constant nor periodically changing Typical for compressed audio/video Harder to transmit/process D 1 D 2 D 3... D n t

Data Stream Characteristics 20 Continuous media consist of a time-dependent sequence of individual information units: Logical Data Units (LDUs) Example: Symphony A symphony consists of independent movements, movements consists of scores Using e.g. PCM, 44.100 samples are made per second. On a CD, samples are grouped into units with a duration of 1/75 second Possible LDUs with different granularity: movements, scores, groups, samples. Used in digital signal processing: sampling values as LDUs Example: Movie Consists of scenes represented by clips, clips consist of single frames, frames consist of blocks of e.g. 16x16 pixels. Pixels can consist of chrominance and luminance values Using e.g. MPEG, inter-frame coding is used, thus image sequences are the smallest sufficient LDUs Movie Clips Frames Blocks Pixels

Fields of the Lecture 21

Content 22 Basics Audio Technology Images and Graphics Video and Animation Multimedia Systems - Communication Aspects and Services Voice over IP, Video conferencing Group Communication, Synchronization Quality of Service and Resource Management Multimedia Systems Storage Aspects Optical storage media Multimedia file systems, Multimedia databases Multimedia Usage Design and User Interfaces, Abstractions for Programming

Concepts and terminology 23

Concepts and terminology 24 The classical model assumes independent optimization of source and channel coding for best performance as the optimum solution. A source coding method which achieves optimum compression can be extremely sensitive against errors occurring in the channel, e.g. due to feedback of previous reconstruction errors into the decoding process. This requires joint optimization of the entire chain, such that in fact the best quality is retained for the user while the rate to be transmitted over the physical channel is made as low as possible. The classical model assumes a passive receiver ('sink'), which is very much related to broadcast services. In multi media systems, the user can interact, and can take influence on any part of the chain, even back on the signal generation; this is reflected by providing a back channel, which can also be used by automatic mechanisms serving the user by best quality services. Instead of transmitter and receiver, denotation of devices at the front and back ends as server and client better reflects this new paradigm.

Concepts and terminology 25 The classical model assumes one monolithic channel for which the optimization of source coding, channel coding and modulation is made once. Multimedia communication mostly uses heterogeneous networks, which typically have largely varying characteristics; as a consequence, it is desirable to consider the channels more by an abstract level and perform proper adaptation to the instantaneous channel characteristics. Channels can be networks or storage devices. Recovery at the client side may include analysis which is far beyond traditional channel coding, e.g. by conveying loss characteristics to the server via the back channel. Multimedia services are becoming more 'intelligent', including elements of signal content analysis to assist the user. This includes support for content related interaction, support in finding the multimedia information which best serves the needs of the user. Hence, the information source is not just encoded at the front end, but more abstract analysis can be performed in addition; the encoding part itself may also include meta information about the content. Multimedia communication systems typically are distributed systems, which means that the actual processing steps involved are performed at different places. Elements of adaptation of the content to the needs of the network, to the client configuration, or to the user's needs can be found anywhere in the chain. Finally, temporary or permanent storage of content can also reside anywhere, as storage elements are a specific type of channel, intended for the purpose of later review instead of instantaneous transmission.

Concepts and terminology: Quality of Service (QoS) 26 The QoS relating to network transmission includes aspects like transmission bandwidth, delay and losses. It indirectly contributes to the perceived quality. This will be denoted as Network QoS. The QoS relating to perceived signal quality includes the entire transmission chain, including the compression performance of source encoding/decoding, and the inter relationship with the channel characteristics. This is de noted as Perceptual QoS. An overview over methods for measurement is given in Appendix A.1. The QoS relating to the overall service quality is at the highest level. It includes aspects like the level of user satisfaction with the content itself, but also the satisfaction concerning additional services, e.g. how good an adaptation to the user's needs is made. Some methods that are used to express this category of QoS with regard to content identification are described in Appendix A.2. This may be denoted as the Semantic QoS.

Signal representation by source coding 27 By multimedia signal compression, systems for transmission and storage of multimedia signals shall generate the most compact representation, such that the highest possible perceptual quality is achieved. Immediately after capturing, the signal is converted into a digital representation having a finite number of samples and amplitude levels. This step already influences the final quality. If the range of rates that a prospective channel can convey, or the resolution required by an application are not known by the time of acquisition, it is advisable to capture the signal by highest possible quality, and scale it later. In the source coder, the data rate needed for digital representation shall be reduced as much as possible. Properties of the signal which allow reduction of the rate can be expressed in terms of redundancy (which is e.g. the typically expected similarity of samples from the signal). The opinion about the quality of the overall system is ruled by the purpose of the consuming at the end of the chain. If the sink is a human observer, it is useful to adapt the source coding method to perceptual properties of humans, as it would be useless to convey a finer granularity of quality than the user can (or would desire to) perceive. In advanced methods of source coding, content-related properties can also be taken into consideration. This can e.g. be done by putting more emphasis on parts or pieces of the signal in which the user is expected to be most interested.

Signal representation by source coding 28 The encoded information is usually represented in form of binary digits (bits). The bit rate is measured either in bit/sample2, or bit per second (bit/s), where the latter results from the bit/sample ratio, multiplied by the samples/s (the sampling rate). An important criterion to judge the performance of a source coding scheme is the compression ratio. This is the ratio between the bit rate necessary for representation of the uncompressed source and its compressed counterpart. If e.g. for digital TV the uncompressed source requires 165 Mbit/s 3, and the rate after compression is 4 Mbit/s, the compression ratio is 165:4=41.25. If compressed signal streams are stored as files on computer discs, the file size can be evaluated to judge the compression performance. When translating into bit rates, it must be observed that file sizes are often measured in KByte, MByte etc., where one Byte consists of 8 bit, 1 KByte=1,024 Byte, 1 MByte=1,024 KByte etc.

Signal representation by source coding 29

Signal representation by source coding 30 Signal analysis: Important principles for this are prediction of signals and frequency analysis by transforms. In coding applications, the analysis step shall be reversible; by a complementary synthesis performed at the decoder, the signal shall be reconstructed achieving as much fidelity as possible. Hence, typical approaches of signal analysis used in coding are reversible transformations of the signal into equivalent forms, by which the encoded representation is as much free of redundancy as possible. If linear systems or transforms are used for this purpose, the removal of redundancy is often called decorrelation, as correlation expresses linear statistical dependencies between signal samples. To optimize such systems, availability of good and simple models reflecting the properties of the signal is crucial. Methods of signal analysis can also be related to the generation (e.g. properties of the acquisition process) and to the content of signals. Besides the samples of the signal or its equivalent representation, additional side information parameters can be generated by the analysis stage, such as adaptation parameters which are needed during decoding and synthesis.

Signal representation by source coding 31 Quantization: maps the signal, its equivalent representation or additional parameters into a discrete form. If the required compression ratio does not allow lossless reconstruction of the signal at the decoder output, perceptual properties or circumstances of usage should be considered during quantization to retain as much as possible the relevant information. Bit-level encoding: has the goal to represent the discrete set of quantized values by lowest possible rate. The optimization of encoding is mostly performed on basis of statistical criteria.

Signal representation by source coding 32 Important parameters to optimize a source coding algorithm are rate, distortion, latency and complexity. These parameters have mutual influence on each other. The relationship between rate and distortion is determined by the rate distortion function, which gives a lower bound of the rate if a certain maximum distortion limit is required. Improved rate/distortion performance (which means improved compression ratio while keeping distortion constant) can usually be achieved by increasing the complexity of the encoding/decoding algorithm. Alternatively, increased latency also helps to increase compression performance; if for example an encoder is able to look ahead on effects of current decisions on future encoding steps, this provides an advantage.

Optimization of transmission 33 The interface between the source coder and the channel is also of high importance for the overall Perceptual QoS. - Source encoder removes redundancy from the signal, - Channel encoder adds redundancy to the bit stream for the purpose of protection and recovery in case of losses. At the receiver side, the channel decoder removes the redundancy inserted by the channel encoder, while the source decoder supplements the redundancy which was removed by the source encoder. The operation of source encoding and channel decoding is similar and vice versa. Actually, the more complex part is usually on the side where redundancy is removed, which means finding the relevant information within an overcomplete representation. Source and channel encoding play counteracting roles and should be optimized jointly for optimum performance.

Optimization of transmission 34 In the context of multimedia systems it often is advantageous to view the channel as a 'black box' for which a model exists. This in particular concerns error/loss characteristics, bandwidth, delay (latency) etc., which are the most important parameters of Network QoS. When parameters of Network QoS are guaranteed by the network, adaptation between source coding and the network transmission can be made in an almost optimum way. This is usually done by negotiation protocols. If no Network QoS is supported, specific mechanisms can be introduced for adaptation at the server and client sides. This includes application-specific error protection based on estimated network quality or usage of re-transmission protocols. Introduction of latency is also a viable method to improve the transmission quality, e.g. by optimization of transmission schedules, temporary buffering of information at the receiver side before presentation is started, or scrambling/interleaving of streams when bursty losses are expected.

Optimization of transmission 35 Today's digital communication networks as used for multimedia signal transmission are based on the definition of distinct layers with clearly defined interfaces. On top of the physical transmission layer, a hierarchy of protocol stacks performs the adaptation up to the application layers. In such a configuration, optimization over the entire transmission chain could only be achieved by cross-layer signaling, which however imposes additional complexity to the transmission.

Signal sources and acquisition Multimedia systems mainly process digital representations of signals, while the acquisition and generation of natural signals will in many cases not directly be performed by a digital device; electro-magnetic (microphone), optical (lens), chemical (film) media may be involved. In such cases, the properties of the digital signal are influenced by the signal conversion process during acquisition. The analog-to-digital conversion itself consists of a sampling step which maps a spatio-temporally continuous signal into discrete samples, and a quantization step which maps an amplitude-continuous signal into numerical values. If natural signals are captured, part of the information originally available in the outside (three-dimensional) world is lost due to - limited bandwidth or resolution of the acquisition device; -"Non-pervasiveness" of the acquisition device, which resides at a singular position in the 3D exterior world, such that the properties of the signal are available only for this specific view or listening point; a possible solution is the usage of multiple cameras or microphones, where however acquisition of 3D spatial information will always be incomplete. 36

Signal sources and acquisition 37

Signal sources and acquisition 38 In digital imaging the signal is also sampled in the horizontal dimension, and is converted (quantized) into numerical values instead of continuous-amplitude electrical signals. The image plane of width S1 and height S2 is mapped into N1 and N2 discrete sampling locations and represents a frame sample within a time-dependent sequence. Sampled and spatially bounded images can be expressed as matrices. Often, in the indexing of the samples the top left pixel of the image is assigned with coordinate (0,0) and is the top left element of the matrix as well.

Signal sources and acquisition 39 In analog video technology (and still holding in the first generations of digital video cameras), interlaced acquisition is widely used, where the even and odd lines are captured at different time instances. Here, a video frame consists of two fields, each containing only half number of lines. This method incurs a time shift between the even and odd lines of the composite frames. When the entire frame is captured simultaneously (as done by movie cameras), the acquisition is progressive. It is expected that in the future most content will be captured progressively.

Digital representation of multimedia signals 40 The process of digitization of a signal consists of sampling (see sec. 2.2) and quantization (see more details in chapter 4). The resultant 'raw' digital format is denoted as Pulse Code Modulation (PCM) representation. These formats are often regarded as the original references in digital multimedia signal processing applications. To capture and represent color images, the most common representation consists of three primary components of active light, red (R), green (G) and blue (B). These components are separately acquired and sampled. This results in a count of samples which is higher by a factor of three as compared to monochrome images. True representation of color may even require more components in a multi-spectral representation. Color images and video are often represented by a luminance component Y and two chrominance (color difference) components. For the transformation between R,G,B and luminance/chrominance representations, different definitions exist, depending on the particular application domain.

Digital representation of multimedia signals 41 For example, in standard TV resolution video, the following transform is mainly used: For high definition (HD) video formats, the transform is more commonly used. The possible color variations in the R,G,B color space are restricted such that perceptually and statistically more important colors are represented more accurately. Chrominance components are in addition usually sub-sampled, which is reasonable as the human visual sense is not capable to perceive differences in color by the same high spatial resolution as for the luminance component.

Digital representation of multimedia signals 42 In interlaced sampling, sub-sampling of chrominances is mostly performed only in horizontal direction to avoid color artifacts in case of motion, while for progressive sampling both horizontal and vertical directions of chrominance can be sub-sampled into lower resolution. Component sampling ratios are often expressed in a notation C1:C2:C3 to express the relative numbers of samples. For example, - when the same number of samples is used for all three components like in R,G,B, the expression is '4:4:4'; - a Y,Cb,Cr sampling structure with horizontal-only sub-sampling of the two chrominances is expressed by the notation '4:2:2', while '4:1:1' indicates horizontal sub-sampling by a factor 4; - if sub-sampling is performed in both directions, i.e. half number of samples in chrominances along both horizontal and vertical directions, the notation is '4:2:0'. The respective source format standards also specify the sub-sampled component sample positions in relation to the luminance sample positions.

Digital representation of multimedia signals 43

Digital representation of multimedia signals 44 For video representation, besides the total number of bits e.g. required to store a movie, the number of bits per second is important for transmission. It is straightforward to multiply the number of bits per frame by the number of frames per second instead of total number of frames.

Digital representation of multimedia signals 45 For standard TV resolution, the source of the digital TV signal is the analog TV signal of 625 lines in Europe (525 lines in Japan or US), typically recorded by an interlaced schema. These analog signals are sampled by a rate of 13.55 MHz for the luminance. After removal of vertical blanking intervals, 575 (480) active lines remain. The horizontal blanking intervals (for line synchronization) are also removed, which gives around 704 active pixels per line. The digital formats listed in Tab. 1.2 are storing only those active pixels with a very small overhead of few surplus pixels from the blanking intervals. Japanese and US (NTSC) formats are traditionally using 60 fields per second (30 frames per second), while in Europe, 50 fields per second (25 frames per second) is used in analog TV (PAL, SECAM). The digital standards defining HD formats are more flexible in terms of frame and field rates, allowing ranges of 24, 25, 30, 50 or 60 frames/second, 50 or 60 fields/second; movie material, interlaced and progressive video are supported. For higher resolutions, the '720p' format (720 lines progressive) is widely used in professional digital video cameras. All 'true' HDTV formats have 1080 lines in the digital signal. There are other commonly used formats, some of which are generated by digitally down converting the standard TV resolution, e.g. the half horizontal resolution (HHR), the Common Intermediate Format (CIF) or Standard Intermediate Format (SIF) and the Quarter CIF (QCIF). For computer display or mobile devices, also formats such as VGA and QVGA are commonly used. Higher resolutions beyond HD are currently expected to emerge from the professional area ( Digital Cinema ) into consumer applications. Current plans are to introduce formats with double number of samples horizontally/vertically as compared to HD1080, then called 4Kx2K or quadrupling the number ( 8Kx4K ). Those formats will only support progressive sampling, but frame rates may become even higher in the future (72 frames per second and beyond).

Digital representation of multimedia signals 46 This figure gives a coarse impression of the sampled image areas supported in formats between QCIF and HDTV. An increased number of samples can either be used to increase the resolution (spatial detail), or to display scenes by a wider angle. For example, in a cinema movie close-up views of human faces are rarely shown. Movies displayed on a cinema screen allow the observer's eye to explore the scene, while on standard definition TV screens and even more for the smaller formats, this capability is very limited. For medical and scientific purposes, digital images with much higher resolution than in movie production are used, resolutions of up to 10,000x10,000 = 100.000,000 pixels are quite common. Such formats are not realistic yet for realtime acquisition by digital video cameras, as the clock rates for sampling would be extremely high.

Digital representation of multimedia signals 47 Speech and audio signals For audio signals, parameters such as sampling rate and precision (bit depth) take most influence on the resulting data rates of the digital representation. These parameters highly depend on the properties of the signals, and on the requirements for quality. In speech signal quantization, nonlinear mappings using logarithmic amplitude compression are used, which for the case of low amplitudes provides an equivalently low quantization noise as in 12 bit quantization, even though only 8 bit/sample are used. For music signals to be acquired by audio CD quality, linear 16 bit representation is most commonly necessary. For some specialized applications, even higher bit depths and higher sampling rates than for CD are used.

Digital representation of multimedia signals 48

Digital representation of multimedia signals 49 Need for compression technology Due to the tremendous amount of rates necessary for representation of the original uncoded formats, the requirement for data compression by application of image, video and audio coding is permanently present, even though the available transmission bandwidth is further increasing by advances in communications technology. In general, the past experience has shown that multimedia traffic increases faster than new capacity is becoming available, and compressed transmission of data is inherently cheaper. If sufficient bandwidth is available, it is more efficiently used in terms of quality that serves the user, if the resolution of the signal is increased. Further, certain types of communication channels (in particular in mobile transmission) exist where the bandwidth is inherently expensive due to physical limitations. This must however be weighted against the complexity that is necessary for the implementation of a compression algorithm, which may lead to higher cost of the device and higher power consumption, which is in particular critical for mobile devices.